tour-of-the-linux-procs
Per-CPU Processes (technically kthreads):
migration/N
→SCHED_FIFO:99
(true RT, keeps load balanced).idle-inject/N
→ can run as RT or high priority depending on kernel config, used for thermal/power idle injection.ksoftirqd/N
→SCHED_OTHER
(handles bottom halves if softirqs pile up).watchdog/N
→ usuallySCHED_OTHER
, but high nice priority, checks for soft lockups.
Details
Linux’s fundamental schedulable unit is a task. User “processes” and “threads” are both tasks internally. These per-CPU things (migration/N, ksoftirqd/N, watchdog/N, idle-inject/N) are created by the kernel (via kthread_create*) as kernel threads. Each one is its own thread group (TGID == PID), so tools display them like standalone processes
They’re each their own kernel task:
- Each one has its own PID (and TGID == PID), its own kernel stack, and no mm_struct (so no userland address space at all).
- They don’t share “process memory” like threads in your program would.
- Instead, they coordinate through kernel-internal data structures (scheduler runqueues, per-CPU lists, global locks, etc.), which are just regular kernel objects, not some “shared user memory region.”
So:
✅ They’re per-CPU, single-thread kernel tasks.
✅ No shared heap/stack.
✅ The “sharing” they do is through kernel subsystems, not process memory.
migration/*
kthreads
-
There’s one
migration/N
kernel thread per CPU. -
Purpose: handle task migration between CPUs.
- If the scheduler decides a task would be better on a different core (load balancing, affinity change, CPU hotplug, NUMA balancing), it doesn’t just teleport it — the
migration/N
kthread on the destination CPU takes responsibility for pulling it in.
- If the scheduler decides a task would be better on a different core (load balancing, affinity change, CPU hotplug, NUMA balancing), it doesn’t just teleport it — the
-
Always running as PID 1xx-ish range, priority
RT
in htop -
They never “hog” CPU; they just wake briefly to shuffle tasks.
idle-inject/N
kthreads
- These are part of the thermal/power management framework.
- If the kernel decides a CPU needs to be throttled (thermal pressure, energy savings, idle injection cgroups), the
idle-inject/N
kthread simulates “forced idle” by scheduling itself and doing nothing. - It effectively says: “you, CPU N, pretend to be idle for X µs, so you cool down or save power.”
Got it — here’s a write-up for watchdog in the same style as your idle-inject/N
note:
watchdogd
kthread
- This is part of the kernel lockup detector.
- Its job is to periodically schedule on a CPU and check if other CPUs are still making progress (i.e., not stuck in a hard or soft lockup)
- Runs under
SCHED_FIFO
with a moderate RT priority (commonly 50) so that it can preempt normal tasks, but doesn’t outrank critical kernel RT tasks likemigration/N
(priority 99). - May appear as one global
watchdogd
task rather than per-CPU threads — on modern kernels, the detector uses high-resolution timers internally, so a single kthread can service all CPUs instead of spawningwatchdog/N
per CPU as older kernels did. - Whether you see
watchdogd
, multiplewatchdog/N
threads, or none at all depends on kernel version, configuration, and distro defaults (CONFIG_SOFTLOCKUP_DETECTOR
,CONFIG_HARDLOCKUP_DETECTOR
, etc.).
Hard v. Soft Lockups
Soft lockup
- Definition: a CPU is still handling interrupts, but it has not scheduled any normal tasks (kernel or user) for a long time (default threshold ~20 s).
- Symptom: the CPU is “spinning” or stuck in a long loop in kernel space, starving the scheduler.
- Detected by: the watchdogd kthread waking up periodically on each CPU (via hrtimer). If it doesn’t get scheduled in time, it concludes the CPU is stuck.
Hard lockup
- Definition: a CPU is so stuck that it isn’t even servicing interrupts.
- Symptom: timer interrupts stop firing entirely on that core.
- Detected by: the NMI watchdog — a performance counter (PMU) generates a Non-Maskable Interrupt (NMI) at a fixed rate. If the kernel stops receiving those NMIs on a CPU, it knows that core is truly locked up at the lowest level.
References
Generic Workers
what kworker/*
is
- while
migration/N
,ksoftirqd/N
, etc. are special-purpose per-CPU kernel threads, kworker/*
are generic worker threads created by the workqueue subsystem.
the workqueue API is how lots of kernel code says “I’ve got some deferred work, please run this later in process context.”
instead of every driver spawning its own thread, they push work items into queues, and the kernel spawns/manages kworker
s to process them.
naming / types you’ll see
kworker/0:1
,kworker/3:2
→ a kworker pinned to CPU 0 or CPU 3.kworker/u8:0
→ “unbound” kworker (can run on any CPU).- sometimes you’ll see extra hints like
kworker/2:1H
— theH
suffix means it’s a high-priority worker (real-time workqueue).
so there are per-CPU pools and unbound pools, depending on whether the work needs CPU locality (e.g. dealing with per-CPU data) or not.
scheduling class
- most kworkers run under SCHED_OTHER (CFS), just like normal tasks.
- some can be RT if they’re serving high-priority workqueues (
H
workers).
memory model
- like other kthreads: no user address space (
mm=NULL
), their own kernel stack, TGID == PID. - they don’t “share memory” in the sense of a process heap.
- instead, all the sharing is through whatever kernel objects they’re working on (block layer structs, networking buffers, timers, etc.).
why they exist
- a hardirq (interrupt handler) can’t sleep, and a softirq is still constrained.
- if some work needs to sleep (e.g. call into filesystem code), it can be punted into a workqueue → handled by a
kworker
. - this lets the kernel defer heavyweight jobs to normal scheduling context without forcing every driver to reinvent threading.
👉 so in short:
migration/N
= per-CPU load balancer, RT:99.ksoftirqd/N
= per-CPU softirq drain, CFS.idle-inject/N
= per-CPU thermal throttler, CFS/RT depending.watchdog/N
= per-CPU lockup detector.kworker/*
= generic “do deferred work” pool, per-CPU and unbound variants.