Networking

Kernel Network Stack

OS TCP/IP Implementation

The subsystem inside the OS kernel that implements TCP/IP — receiving packets from NIC drivers, routing them, applying firewall rules, managing TCP connections, and presenting the socket API to applications. Every byte of network traffic on a computer passes through the kernel network stack.

When a packet arrives at a network interface, the NIC raises an interrupt (or uses NAPI polling on busy systems), the driver copies the packet into a kernel buffer (sk_buff on Linux), and hands it up the kernel network stack. The stack processes it through layers: the NIC driver → L2 (Ethernet demux) → L3 (IP routing, netfilter/iptables hooks) → L4 (TCP/UDP — reassembly, congestion control, flow control) → socket buffer → application via read() or recv(). The reverse path applies for transmission. This entire journey happens in kernel space — the application sees only the socket API; all the protocol complexity is hidden in the kernel.

Linux network stack layers

LayerKernel componentFunction
NIC drivere1000e, ixgbe, mlx5, etc.DMA from NIC ring buffer, interrupt/NAPI
Layer 2net/ethernet, bridgeEthernet frame processing, bridging, VLANs
Netfilteriptables / nftables / XDPPacket filtering, NAT, connection tracking
Layer 3net/ipv4, net/ipv6IP routing, fragmentation, ICMP
Layer 4net/ipv4/tcp.c, udp.cTCP state machine, congestion control, UDP
Socket APInet/socket.cBerkeley sockets — connect, send, recv, bind

TCP congestion control in the kernel

The Linux kernel implements multiple TCP congestion control algorithms, selectable per-socket or system-wide. CUBIC (default since Linux 2.6.19) uses a cubic function to probe bandwidth — performs well on high-bandwidth low-latency links. BBR (Bottleneck Bandwidth and RTT, added in Linux 4.9) models the network path rather than reacting to loss — delivers better throughput on high-latency links (satellite, long-distance) and under shallow buffers. RENO is the classic algorithm — still used as a baseline. Setting net.ipv4.tcp_congestion_control=bbr often improves speed test results and real-world throughput on connections with significant RTT or packet loss, because BBR doesn't reduce its rate on isolated loss events the way CUBIC does.

Frequently Asked Questions

Why does the kernel network stack matter for speed test results?

TCP buffer sizes, congestion control algorithm, and NIC driver settings all cap what throughput a system can achieve. Too-small socket buffers limit throughput on high-latency links (bandwidth-delay product problem). BBR congestion control outperforms CUBIC on high-latency or lossy paths. These kernel settings — not line speed — are often the real ceiling on measured speed.

What is kernel bypass networking and when is it used?

DPDK, XDP, and RDMA move packet processing out of the kernel entirely, eliminating context switch and interrupt overhead. Used in high-frequency trading, NFV, and high-performance routers where 10G+ line rate requires processing millions of packets per second per core. Not relevant for typical end-user systems.

What kernel settings affect network performance on Linux?

Key sysctls: net.core.rmem_max / wmem_max (socket buffer maximums), net.ipv4.tcp_rmem / tcp_wmem (TCP buffer ranges), net.ipv4.tcp_congestion_control (BBR recommended for high-latency links), net.ipv4.tcp_fastopen (reduces handshake latency). Modern distros set reasonable defaults; manual tuning matters most for servers or multi-gigabit connections.

Related Terms

More From This Section