From NIC to User Space: Data Structures and Ring Buffer Behavior

When a network interface card (NIC) receives packets, it uses DMA to place them directly into pre-allocated buffers in system RAM. The process is structured around a fixed-size RX ring buffer of descriptors, which the NIC and driver share.

RX Ring Buffer Basics

Descriptor Structure

A simplified receive descriptor might look like:

struct rx_desc {
    void *buf_addr;   // Physical address of buffer for DMA
    uint16_t length;  // Packet length
    uint8_t status;   // Flags: DONE, errors, VLAN info, etc.
};

Ring = a fixed-size array of these descriptors.
Fixed size because:
- Hardware allocates internal state for each entry.
- It’s easier for the driver to wrap pointers with modulo arithmetic.
Circular: After the last entry, pointers wrap back to index 0.

Packet Arrival Path

Step 0 — NIC Writes via DMA

NIC’s PHY and MAC receive a packet from the wire.
NIC picks the next free descriptor (pointed to by its head pointer).
DMA engine writes entire packet into buf_addr.
NIC updates length and status = DONE.

At this point, the RX ring might look like:

[ DONE, DONE, DONE, EMPTY, EMPTY, ... ]

Step 1 — Interrupt

NIC signals CPU via IRQ or MSI-X.
CPU’s Local APIC routes to the assigned core.
ISR (Interrupt Service Routine) runs quickly:
- Acknowledges interrupt.
- Schedules NAPI poll for packet processing.

Step 2 — NAPI Poll

NAPI runs in softirq context.
Driver walks the RX ring starting at its tail pointer:
1. For each descriptor marked DONE (Ready to be pulled out):
  - Create sk_buff pointing to the DMA buffer.
  - Set length, protocol, and other metadata.
  - Pass sk_buff into netif_receive_skb() (network stack).
2. Mark descriptor EMPTY and advance tail pointer.

[ EMPTY, <Tail Here>, DONE, EMPTY, EMPTY, ... ]

What does softIRQ context mean in Linux?

In Linux, softIRQ context means the code is running in a special, deferred interrupt handling mode — not as a normal process, but not as a hard interrupt either.
It’s a middle ground the kernel uses so that heavy work triggered by an interrupt doesn’t block other interrupts for too long.

Why it exists

Hard IRQ context (ISR): Runs immediately when the CPU gets an interrupt.

Runs with interrupts disabled.
Must be very quick — just enough to acknowledge hardware and schedule real work.
Can’t sleep or block.

SoftIRQ context:

Scheduled by a hard IRQ handler.
Runs with interrupts enabled.
Can take longer because it’s not holding up other interrupts.
Still not normal process context — can’t sleep, can’t call blocking functions.
Runs either right after the hard IRQ exits or later in a special kernel thread (ksoftirqd).

How this applies to NAPI and NIC RX

Packet arrives → NIC raises interrupt → ISR runs in hard IRQ context.
ISR disables further NIC interrupts and calls napi_schedule().
napi_schedule() marks the NIC’s poll handler to run in NET_RX_SOFTIRQ context.
Later, the kernel runs the softIRQ handler:
- Driver’s poll() function drains RX ring, creates sk_buffs, passes them to networking stack.
When the softIRQ finishes, interrupts are fully re-enabled and the CPU can resume normal tasks.

Why not just do all work in the ISR?

Because:

Copying packets, allocating sk_buffs, running through protocol parsing is slow compared to just acknowledging the hardware.
While you’re in a hard IRQ, all other interrupts on that CPU are masked.
Spending 500 µs in a hard IRQ could delay timers, disk I/O, other NIC queues, etc.
Splitting into hard IRQ → softIRQ keeps “acknowledge and exit fast” while still giving low latency packet processing.

Step 3 — Network Stack

L2 parse: Ethernet header removed.
L3 parse: IP header validated, checksum verified (or skipped if offloaded).
L4 parse: TCP/UDP header parsed, ports identified.
skb is queued into the correct socket receive queue.

Step 4 — User Space

App calling recv() or read() wakes up if data is available.
skb’s payload copied into user buffer (or mapped in zero-copy modes).
skb freed back to pool.

What Happens if the RX Ring Fills

Because the RX ring has a fixed number of slots, it can fill up under heavy load:

Filling Condition

NIC head pointer catches up to driver tail pointer.
All descriptors are marked DONE, but driver hasn’t processed them yet.
No free descriptor = nowhere to DMA next packet.

Result

Packets are dropped in hardware:
- NIC increments a missed packet or RX drop counter.
- The dropped frame never makes it into RAM.
Driver can read this counter via NIC registers for diagnostics.

Why it happens

CPU or driver not processing descriptors fast enough (high interrupt load, other workloads).
Burst of incoming packets exceeds ring capacity before driver can catch up.
Interrupt moderation might delay driver wakeup just long enough for the ring to fill.

Mitigations

Increase RX ring size
- Some NICs allow larger descriptor rings (e.g., from 256 to 4096 entries).
Enable NAPI / packet batching
- Reduces per-packet interrupt cost, processes multiple per poll.
Distribute load with RSS
- Multiple RX queues mapped to different cores.
Use higher-performance packet paths
- XDP, DPDK, or other bypass frameworks.
Tune interrupt moderation
- Lower coalescing delay to drain ring sooner.

Key Insight

Once a packet is dropped due to a full RX ring, it’s gone — the NIC doesn’t have a secondary overflow buffer. The only way to prevent this is to make sure the driver processes descriptors quickly enough, or to spread the load across more queues/cores.

RX Ring Buffer Basics​

Descriptor Structure​

Packet Arrival Path​

Step 0 — NIC Writes via DMA​

Step 1 — Interrupt​

Step 2 — NAPI Poll​

Why it exists​

How this applies to NAPI and NIC RX​

Why not just do all work in the ISR?​

Step 3 — Network Stack​

Step 4 — User Space​

What Happens if the RX Ring Fills​

Filling Condition​

Result​

Why it happens​

Mitigations​

Key Insight​