Skip to main content

Goal

Walk through process creation → execution → termination, and what happens in between.


1) Process states (concepts + what ps shows)

High-level lifecycle:

new → ready → running ↔ waiting (blocked) → terminated

└→ zombie (briefly, if parent hasn’t reaped yet)

Linux-y view (common ps STAT letters):

  • R: running/runnable (ready)
  • S: interruptible sleep (waiting on event)
  • D: uninterruptible sleep (usually I/O)
  • T/t: stopped/traced
  • Z: zombie (terminated, not yet reaped)
  • I: idle (kernel threads)
  • X: dead (shouldn’t normally see)

Quick peek:

ps -eo pid,ppid,stat,comm | head

2) fork() + exec() (with Copy-on-Write)

Mental model

  • fork(): clone current process. Parent gets child PID; child gets 0.
  • CoW (Copy-on-Write): parent & child share page frames read-only until either writes → first write faults → private copy.
  • exec(): replace the child’s entire memory image with a new program. FDs remain open unless FD_CLOEXEC.

Minimal C example (fork + exec + FD_CLOEXEC)

// build: gcc -Wall -O2 demo_fork_exec.c -o demo_fork_exec
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main(void) {
int fd = open("keepopen.txt", O_CREAT|O_WRONLY|O_TRUNC, 0644);
dprintf(fd, "hello before exec\n");

// Uncomment next line to prevent fd from leaking across exec:
// fcntl(fd, F_SETFD, FD_CLOEXEC);

pid_t pid = fork();
if (pid < 0) { perror("fork"); exit(1); }
if (pid == 0) { // child
// Replace image with /usr/bin/wc (word count) reading from fd
if (dup2(fd, 0) == -1) { perror("dup2"); exit(1); }
execlp("wc", "wc", "-c", NULL);
perror("exec"); // runs only if exec fails
exit(1);
} else { // parent
printf("parent %d → child %d\n", getpid(), pid);
close(fd);
wait(NULL);
}
}

CoW demo (quick + visible)

#!/usr/bin/env python3
import os
import time
import re

ROLLUP_FIELDS = [
"Rss",
"Pss",
"Shared_Clean",
"Shared_Dirty",
"Private_Clean",
"Private_Dirty",
"AnonHugePages",
]


def read_rollup(pid):
out = {k: 0 for k in ROLLUP_FIELDS}
try:
with open(f"/proc/{pid}/smaps_rollup") as f:
for line in f:
m = re.match(r"^(\w+(?:_\w+)*)\:\s+(\d+)\s+kB$", line)
if m and m.group(1) in out:
out[m.group(1)] = int(m.group(2))
except FileNotFoundError:
return None
return out


# Allocate ~200 MB and ensure pages are faulted in (touch once)
SIZE = 200 * 1024 * 1024
PAGE = 4096
blob = bytearray(SIZE)
for i in range(0, SIZE, PAGE):
blob[i] = 0

pid = os.fork()
if pid == 0:
me = os.getpid()
print(f"[child] pid={me}, ppid={os.getppid()}")
time.sleep(2)
print("[child] writing one byte per page (trigger CoW)")
for i in range(0, SIZE, PAGE):
blob[i] = (blob[i] + 1) % 256
print("[child] done; sleeping")
time.sleep(2)
else:
me = os.getpid()
print(f"[parent] pid={me}, child_pid={pid}")
for t in range(4):
time.sleep(1)
pr = read_rollup(me)
cr = read_rollup(pid)
if pr is None or cr is None:
break

def fmt(r): # show key stats in MB
return (
"Rss={:.1f} Pss={:.1f} | Priv(C/D)={:.1f}/{:.1f} | "
"Shared(C/D)={:.1f}/{:.1f}"
).format(
r["Rss"] / 1024,
r["Pss"] / 1024,
r["Private_Clean"] / 1024,
r["Private_Dirty"] / 1024,
r["Shared_Clean"] / 1024,
r["Shared_Dirty"] / 1024,
)

print(f"[t+{t:02d}s] parent: {fmt(pr)} || child: {fmt(cr)}")
os.waitpid(pid, 0)

3) PID, PPID, UID/GID (identity & hierarchy)

  • PID: process ID (getpid()).
  • PPID: parent PID (getppid()).
  • UID/GID: real/effective IDs control permissions (id, getuid(), geteuid()).

Quick checks:

echo "PID=$$ PPID=$PPID"     # in a shell
id # shows uid/gid/groups
cat /proc/$$/status | sed -n '1,15p' # Name, State, Tgid, Pid, PPid, Uids, Gids…
pstree -ap | head

4) Signals: delivery, handling, default actions

Sources: kernel (faults like SIGSEGV), userspace (kill, raise, pthread_kill), terminal (Ctrl-C → SIGINT, Ctrl-Z → SIGTSTP).

Disposition types:

  • Default (terminate / core / stop / continue)
  • Ignore
  • Handler via sigaction(2) (preferred over signal(2))

Uncatchable: SIGKILL (9), SIGSTOP (19).

Threading rules:

  • Signals are per-process but delivered to one thread (unless targeted).
  • Synchronous signals (e.g., SIGSEGV) go to the faulting thread.

Tiny Python demo:

# sig_demo.py
import os, signal, time
def on_usr1(signum, frame):
print(f"got SIGUSR1 in pid={os.getpid()}")
signal.signal(signal.SIGUSR1, on_usr1)

print("pid:", os.getpid())
time.sleep(30)

Terminal 2:

kill -USR1 <pid>      # triggers handler

5) Zombie processes

Definition: process has exited, but parent hasn’t called wait() → kernel keeps a tiny entry (exit code, usage) so the parent can reap.

Why you see Z/<defunct>: parent forgot/failed to wait().

How to resolve:

  • Fix parent to call wait()/waitpid() (or use a SIGCHLD handler).
  • If parent is stuck/misbehaving, kill the parent; when it exits, PID 1 adopts & reaps the zombie.

Demo (bash):

# parent.sh
( sleep 0.2; exit 0 ) & # child will exit quickly
echo "child pid: $!"
sleep 10 # parent doesn't wait → child becomes zombie for ~10s
ps -o pid,ppid,stat,comm -p $!

6) Orphan processes (re-parenting to PID 1)

If a parent dies first, its living children are re-parented to PID 1 (on most Linux distros, systemd), which reaps them when they exit.

Check before/after:

python3 -c 'import os,time; print("pid",os.getpid(),"ppid",os.getppid()); time.sleep(20)'
# In another terminal, kill the parent shell or `disown`/exit, then:
cat /proc/<child>/status | grep -E 'Pid|PPid'
# PPid should become 1

7) Multi-threaded fork() behavior

  • In a multi-threaded process, only the calling thread is present in the child after fork().
  • The child is in a fragile state: only call async-signal-safe functions before exec() (POSIX rule). Best practice: fork() then immediately exec().
  • Use pthread_atfork() if you must, but modern advice often favors posix_spawn() (which can be implemented more safely/efficiently under the hood).

Sketch:

// Pseudocode: threads running… then:
pid_t pid = fork();
if (pid == 0) { // child: single thread only
execlp("someprog", "someprog", NULL);
_exit(127); // if exec fails
}

8) Practical checklist & commands

  • Inspect states & trees:

    ps -eo pid,ppid,stat,cmd --sort=ppid
    pstree -alpun | less
  • Per-process info:

    ls -1 /proc/<PID>/
    cat /proc/<PID>/status
    ls -1 /proc/<PID>/task/ # thread list
  • Signals:

    kill -l                # list signals
    kill -TERM <pid> # graceful
    kill -KILL <pid> # force (can’t be caught)
  • File descriptors & CLOEXEC:

    ls -l /proc/<PID>/fd
    cat /proc/<PID>/fdinfo/<n> | grep flags
  • Memory (watch CoW/RSS):

    cat /proc/<PID>/status | egrep 'Vm(Size|RSS)'
    pmap -x <PID> | head

9) Mini-labs (10–15 min each)

  1. CoW in action

    • Run demo_cow.py.
    • While it runs, sample /proc/<child>/status every 1–2s and observe VmRSS jump only after the child writes.
  2. Zombie creation & cleanup

    • Use parent.sh to create a zombie.
    • Verify STAT=Z with ps.
    • In a second run, modify parent to wait and confirm no zombie.
  3. Orphan reparenting

    • Start a long-sleeping child from a subshell.
    • Kill/exit the parent shell.
    • Confirm PPid: 1 for the child.
  4. Signal handler

    • Run sig_demo.py, send SIGUSR1, then SIGTERM.
    • Note: handler runs for SIGUSR1; default terminate on SIGTERM.
  5. FD leakage across exec

    • Build and run demo_fork_exec.
    • Observe whether keepopen.txt is readable by wc after exec.
    • Rebuild with FD_CLOEXEC set; observe difference.

Zombie vs. Orphan → both are about parent/child relationships, but they happen at different stages in the child’s life.


Zombie process

  • State: Dead (has exited), but still has a parent.
  • Cause: Parent hasn’t called wait() yet to collect its exit status.
  • Kernel behavior: Keeps the process table entry so the parent can read the status.
  • What you see: STAT = Z / <defunct> in ps.
  • Lifetime: Disappears when the parent reaps it or the parent dies (then init reaps it).
  • Memory footprint: Almost none — only PID and accounting info remain.
  • Analogy: Corpse still on the books because the paperwork hasn’t been filed.

Orphan process

  • State: Alive (still running).
  • Cause: Its parent process has exited before it does.
  • Kernel behavior: Re-parents it to PID 1 (init/systemd), which becomes its new parent.
  • What you see: PPid = 1 in /proc/<pid>/status or ps.
  • Lifetime: Continues running normally until it finishes or is killed.
  • Memory footprint: Full — it’s still an active process.
  • Analogy: Kid whose parent left, now adopted by init.

Quick visual

StateChild alive?Parent alive?PPidTypical STAT
Normalreal PPidvaries
Zombiereal PPidZ
Orphan1 (init)varies

10) Quick quiz (self-check)

  • What turns a terminated process into a zombie?
  • Name two signals you cannot catch or ignore.
  • After fork() in a multi-threaded program, how many threads exist in the child?
  • How does Copy-on-Write delay memory copying?
  • What reaps orphans if the original parent disappears?

11) Pro tips & gotchas

  • Prefer sigaction() over signal(); set SA_RESTART carefully.
  • In daemon-like parents, install a SIGCHLD handler or periodically waitpid(-1, …, WNOHANG) to prevent zombies.
  • Use posix_spawn() (or higher-level wrappers) when forking from complex, threaded apps.
  • Remember environment and open FDs survive exec() unless explicitly changed (env vars, FD_CLOEXEC).
  • vfork() exists but is tricky; unless you know why, don’t.