The Two Windows That Matter
| Window | Controlled By | Purpose |
|---|---|---|
| Congestion window (cwnd) | Sender | Limits data in flight based on perceived network congestion |
| Receive window (rwnd) | Receiver | Limits data in flight based on receiver buffer space |
The actual amount of data a sender may have in flight is the minimum of these two values. Both must be large enough to fill the path's bandwidth-delay product for full throughput.
Slow Start: Step by Step
At the beginning of a TCP connection, the sender does not know the path's capacity. Rather than immediately sending as much as the receive window allows, it begins with a small congestion window — typically 10 segments in modern implementations (RFC 6928). The "slow" in slow start is relative to sending at full rate instantly; the actual growth is exponential:
- The sender starts with cwnd = 10 MSS (maximum segment size).
- For each acknowledgment received, cwnd increases by one MSS.
- Because ACKs return for every segment, cwnd effectively doubles each round-trip time (RTT).
- This continues until cwnd reaches the slow start threshold (ssthresh) or a sign of congestion appears.
On a 50 ms RTT connection, cwnd doubles roughly 20 times per second during slow start. A 10 Mbps path can be fully utilized within a few hundred milliseconds on a clean network. On a 200 ms RTT satellite path, the same ramp takes four times as long, which is why latency is as important as bandwidth for throughput.
Congestion Avoidance: Additive Increase
When cwnd reaches ssthresh, the sender switches from exponential slow start to the congestion avoidance phase. Growth becomes additive: cwnd increases by roughly one MSS per RTT (one MSS per full window of ACKs) rather than doubling. This cautious linear growth is often called AIMD — Additive Increase, Multiplicative Decrease. The sender probes for more bandwidth slowly while trying to avoid causing congestion collapse.
Fast Retransmit: Reacting to Duplicate ACKs
If a segment is lost, the receiver keeps ACKing the last in-order byte it received. When the sender receives three duplicate ACKs for the same sequence number, it infers that a segment was lost without waiting for a full retransmission timeout (which can take seconds). Fast retransmit immediately resends the missing segment. This is far faster than waiting for timeout-based detection and reduces throughput impact substantially.
Fast Recovery: Multiplicative Decrease
After fast retransmit, classic TCP Reno performs fast recovery: ssthresh is set to half of cwnd, and cwnd is reduced to ssthresh. The sender then enters congestion avoidance rather than returning all the way to slow start. This halving — the multiplicative decrease in AIMD — is the TCP signal to back off. After a retransmission timeout (a more severe event), TCP does fall back to slow start with cwnd = 1, which is why timeouts are far more damaging than triple-duplicate-ACK events.
CUBIC: The Dominant Linux Algorithm
CUBIC is the default congestion control algorithm in Linux since kernel 2.6.19 and is widely used across servers and devices. Instead of the classic linear increase in congestion avoidance, CUBIC uses a cubic function of elapsed time since the last congestion event to determine cwnd growth. CUBIC grows aggressively when far from the last congestion point, slows near that point, then probes for more capacity beyond it. This makes CUBIC very efficient on high-bandwidth, high-latency paths (such as long-haul fiber) while remaining stable on lower-latency networks. It also uses a minimum RTT target to ensure the algorithm adapts well across different path characteristics.
BBR: Bottleneck Bandwidth and RTT
BBR (Bottleneck Bandwidth and RTT), developed at Google and available in Linux since kernel 4.9, takes a fundamentally different approach. Classic algorithms use packet loss as the primary congestion signal. BBR ignores loss as a signal and instead continuously estimates the bottleneck bandwidth and minimum RTT of the path. It targets a sending rate that fills the bottleneck link without creating persistent queues. BBR can significantly outperform CUBIC on high-latency paths and on paths with shallow buffers. It is also better at avoiding bufferbloat because it deliberately limits in-flight data. However, BBR can be less fair to competing CUBIC flows on shared bottlenecks, which remains an active area of research.
How to Measure the Congestion Window
On Linux, the ss command with the -tin flags shows detailed TCP socket state including cwnd. Running ss -tin dst <server-ip> during an active transfer displays the current cwnd value in MSS units, the current RTT, and other per-connection statistics. This is a useful diagnostic when investigating why a specific flow is not using full bandwidth.
Impact on Short vs Long Transfers
Short transfers — small API responses, DNS over TCP, brief file downloads — may complete entirely within the slow start phase and never reach full link speed. Long transfers — large file downloads, streaming, backups — have time to ramp up and can sustain near-full bandwidth once congestion avoidance stabilizes. This is why a speed test using multiple parallel streams can report much higher throughput than a single-connection download: parallel streams each go through slow start independently and collectively fill the pipe faster than one stream can alone.
What It Means for Speed Tests
- Short transfers may not fully ramp before they finish.
- High latency makes the ramp-up and recovery slower.
- Packet loss can reduce throughput sharply, especially if it triggers timeout-based recovery.
- Multiple parallel test streams hide single-connection slow start limits and are more representative of aggregate bandwidth.
- Upload tests often reveal asymmetric congestion that download tests do not surface.
Frequently Asked Questions
What is TCP slow start?
TCP slow start is the phase where a sender begins with a limited congestion window and increases it rapidly as acknowledgments arrive, probing for available capacity.
What is the congestion window?
The congestion window, or cwnd, is a sender-side limit on how much unacknowledged data TCP can have in flight based on perceived network congestion.
Why does TCP slow down after packet loss?
Packet loss is often treated as a sign of congestion. TCP reduces its sending rate so the network can recover instead of continuing to overload the path.