Skip to main content

tour-of-the-linux-procs

Per-CPU Processes (technically kthreads):

  • migration/NSCHED_FIFO:99 (true RT, keeps load balanced).
  • idle-inject/N → can run as RT or high priority depending on kernel config, used for thermal/power idle injection.
  • ksoftirqd/NSCHED_OTHER (handles bottom halves if softirqs pile up).
  • watchdog/N → usually SCHED_OTHER, but high nice priority, checks for soft lockups.
Details

Linux’s fundamental schedulable unit is a task. User “processes” and “threads” are both tasks internally. These per-CPU things (migration/N, ksoftirqd/N, watchdog/N, idle-inject/N) are created by the kernel (via kthread_create*) as kernel threads. Each one is its own thread group (TGID == PID), so tools display them like standalone processes

They’re each their own kernel task:

  • Each one has its own PID (and TGID == PID), its own kernel stack, and no mm_struct (so no userland address space at all).
  • They don’t share “process memory” like threads in your program would.
  • Instead, they coordinate through kernel-internal data structures (scheduler runqueues, per-CPU lists, global locks, etc.), which are just regular kernel objects, not some “shared user memory region.”

So:

✅ They’re per-CPU, single-thread kernel tasks.

✅ No shared heap/stack.

✅ The “sharing” they do is through kernel subsystems, not process memory.

migration/* kthreads

  • There’s one migration/N kernel thread per CPU.

  • Purpose: handle task migration between CPUs.

    • If the scheduler decides a task would be better on a different core (load balancing, affinity change, CPU hotplug, NUMA balancing), it doesn’t just teleport it — the migration/N kthread on the destination CPU takes responsibility for pulling it in.
  • Always running as PID 1xx-ish range, priority RT in htop

  • They never “hog” CPU; they just wake briefly to shuffle tasks.


idle-inject/N kthreads

  • These are part of the thermal/power management framework.
  • If the kernel decides a CPU needs to be throttled (thermal pressure, energy savings, idle injection cgroups), the idle-inject/N kthread simulates “forced idle” by scheduling itself and doing nothing.
  • It effectively says: “you, CPU N, pretend to be idle for X µs, so you cool down or save power.”

Got it — here’s a write-up for watchdog in the same style as your idle-inject/N note:


watchdogd kthread

  • This is part of the kernel lockup detector.
  • Its job is to periodically schedule on a CPU and check if other CPUs are still making progress (i.e., not stuck in a hard or soft lockup)
  • Runs under SCHED_FIFO with a moderate RT priority (commonly 50) so that it can preempt normal tasks, but doesn’t outrank critical kernel RT tasks like migration/N (priority 99).
  • May appear as one global watchdogd task rather than per-CPU threads — on modern kernels, the detector uses high-resolution timers internally, so a single kthread can service all CPUs instead of spawning watchdog/N per CPU as older kernels did.
  • Whether you see watchdogd, multiple watchdog/N threads, or none at all depends on kernel version, configuration, and distro defaults (CONFIG_SOFTLOCKUP_DETECTOR, CONFIG_HARDLOCKUP_DETECTOR, etc.).
Hard v. Soft Lockups

Soft lockup

  • Definition: a CPU is still handling interrupts, but it has not scheduled any normal tasks (kernel or user) for a long time (default threshold ~20 s).
  • Symptom: the CPU is “spinning” or stuck in a long loop in kernel space, starving the scheduler.
  • Detected by: the watchdogd kthread waking up periodically on each CPU (via hrtimer). If it doesn’t get scheduled in time, it concludes the CPU is stuck.

Hard lockup

  • Definition: a CPU is so stuck that it isn’t even servicing interrupts.
  • Symptom: timer interrupts stop firing entirely on that core.
  • Detected by: the NMI watchdog — a performance counter (PMU) generates a Non-Maskable Interrupt (NMI) at a fixed rate. If the kernel stops receiving those NMIs on a CPU, it knows that core is truly locked up at the lowest level.

References


Generic Workers

what kworker/* is

  • while migration/N, ksoftirqd/N, etc. are special-purpose per-CPU kernel threads,
  • kworker/* are generic worker threads created by the workqueue subsystem.

the workqueue API is how lots of kernel code says “I’ve got some deferred work, please run this later in process context.” instead of every driver spawning its own thread, they push work items into queues, and the kernel spawns/manages kworkers to process them.

naming / types you’ll see

  • kworker/0:1, kworker/3:2 → a kworker pinned to CPU 0 or CPU 3.
  • kworker/u8:0 → “unbound” kworker (can run on any CPU).
  • sometimes you’ll see extra hints like kworker/2:1H — the H suffix means it’s a high-priority worker (real-time workqueue).

so there are per-CPU pools and unbound pools, depending on whether the work needs CPU locality (e.g. dealing with per-CPU data) or not.

scheduling class

  • most kworkers run under SCHED_OTHER (CFS), just like normal tasks.
  • some can be RT if they’re serving high-priority workqueues (H workers).

memory model

  • like other kthreads: no user address space (mm=NULL), their own kernel stack, TGID == PID.
  • they don’t “share memory” in the sense of a process heap.
  • instead, all the sharing is through whatever kernel objects they’re working on (block layer structs, networking buffers, timers, etc.).

why they exist

  • a hardirq (interrupt handler) can’t sleep, and a softirq is still constrained.
  • if some work needs to sleep (e.g. call into filesystem code), it can be punted into a workqueue → handled by a kworker.
  • this lets the kernel defer heavyweight jobs to normal scheduling context without forcing every driver to reinvent threading.

👉 so in short:

  • migration/N = per-CPU load balancer, RT:99.
  • ksoftirqd/N = per-CPU softirq drain, CFS.
  • idle-inject/N = per-CPU thermal throttler, CFS/RT depending.
  • watchdog/N = per-CPU lockup detector.
  • kworker/* = generic “do deferred work” pool, per-CPU and unbound variants.