Kernel Network Stack
OS TCP/IP Implementation
The subsystem inside the OS kernel that implements TCP/IP — receiving packets from NIC drivers, routing them, applying firewall rules, managing TCP connections, and presenting the socket API to applications. Every byte of network traffic on a computer passes through the kernel network stack.
When a packet arrives at a network interface, the NIC raises an interrupt (or uses NAPI polling on busy systems), the driver copies the packet into a kernel buffer (sk_buff on Linux), and hands it up the kernel network stack. The stack processes it through layers: the NIC driver → L2 (Ethernet demux) → L3 (IP routing, netfilter/iptables hooks) → L4 (TCP/UDP — reassembly, congestion control, flow control) → socket buffer → application via read() or recv(). The reverse path applies for transmission. This entire journey happens in kernel space — the application sees only the socket API; all the protocol complexity is hidden in the kernel.
Linux network stack layers
| Layer | Kernel component | Function |
|---|---|---|
| NIC driver | e1000e, ixgbe, mlx5, etc. | DMA from NIC ring buffer, interrupt/NAPI |
| Layer 2 | net/ethernet, bridge | Ethernet frame processing, bridging, VLANs |
| Netfilter | iptables / nftables / XDP | Packet filtering, NAT, connection tracking |
| Layer 3 | net/ipv4, net/ipv6 | IP routing, fragmentation, ICMP |
| Layer 4 | net/ipv4/tcp.c, udp.c | TCP state machine, congestion control, UDP |
| Socket API | net/socket.c | Berkeley sockets — connect, send, recv, bind |
TCP congestion control in the kernel
The Linux kernel implements multiple TCP congestion control algorithms, selectable per-socket or system-wide. CUBIC (default since Linux 2.6.19) uses a cubic function to probe bandwidth — performs well on high-bandwidth low-latency links. BBR (Bottleneck Bandwidth and RTT, added in Linux 4.9) models the network path rather than reacting to loss — delivers better throughput on high-latency links (satellite, long-distance) and under shallow buffers. RENO is the classic algorithm — still used as a baseline. Setting net.ipv4.tcp_congestion_control=bbr often improves speed test results and real-world throughput on connections with significant RTT or packet loss, because BBR doesn't reduce its rate on isolated loss events the way CUBIC does.
Frequently Asked Questions
Why does the kernel network stack matter for speed test results?
TCP buffer sizes, congestion control algorithm, and NIC driver settings all cap what throughput a system can achieve. Too-small socket buffers limit throughput on high-latency links (bandwidth-delay product problem). BBR congestion control outperforms CUBIC on high-latency or lossy paths. These kernel settings — not line speed — are often the real ceiling on measured speed.
What is kernel bypass networking and when is it used?
DPDK, XDP, and RDMA move packet processing out of the kernel entirely, eliminating context switch and interrupt overhead. Used in high-frequency trading, NFV, and high-performance routers where 10G+ line rate requires processing millions of packets per second per core. Not relevant for typical end-user systems.
What kernel settings affect network performance on Linux?
Key sysctls: net.core.rmem_max / wmem_max (socket buffer maximums), net.ipv4.tcp_rmem / tcp_wmem (TCP buffer ranges), net.ipv4.tcp_congestion_control (BBR recommended for high-latency links), net.ipv4.tcp_fastopen (reduces handshake latency). Modern distros set reasonable defaults; manual tuning matters most for servers or multi-gigabit connections.
Related Terms
TCP
TCP's state machine and congestion control run inside the kernel network stack.
Packet
Every packet traverses the kernel network stack on receive and transmit.
MTU
The kernel enforces MTU limits and handles IP fragmentation in the network stack.
Full Glossary
All networking terms defined in plain English.