Video Streaming via CDN

Video is roughly two-thirds of all internet traffic, and almost all of it moves through CDNs. The mechanism — adaptive bitrate streaming over HTTP — is conceptually simple: split the video into segments, list them in a manifest, let the player pick which quality to fetch. The CDN's job is to cache the segments and serve them with low latency. The complexity is in the details: segment durations, manifest TTLs, live vs VOD, CMAF, range requests, and the way DRM and tokenization layer on top.

The HTTP-based streaming model

Pre-HTTP streaming used protocols like RTMP and RTSP that maintained a persistent connection between server and player. They worked, but they didn't fit CDN architectures — CDNs are optimized for HTTP, and a long-lived RTMP connection bypasses every caching benefit. The shift to HTTP-based streaming (HLS in 2009, DASH shortly after) made video a normal CDN workload.

The model:

  1. The origin encodes the video at multiple bitrates and splits each into segments (typically 2–10 seconds each).
  2. The origin produces a manifest file listing each bitrate variant and the URLs of every segment within it.
  3. The player downloads the manifest, then begins downloading segments at a chosen bitrate.
  4. The player monitors throughput between segments and picks a higher or lower bitrate on the fly.
  5. Every segment is a normal HTTP GET, cacheable at the CDN like any other resource.

HLS, DASH, and CMAF

Two manifest formats dominate:

ProtocolManifest formatSegment formatMandatory on
HLS.m3u8 (plain text playlist).ts (MPEG-TS) or fMP4iOS, Safari
DASH.mpd (XML)fMP4(open standard; not mandatory on any platform)
CMAFEither .m3u8 or .mpdfMP4 (shared between HLS and DASH)Modern best practice

CMAF is the convergence: a single set of fMP4 segment files plus two manifests (one .m3u8 for HLS, one .mpd for DASH) lets a single origin storage layout serve both protocols. Cache footprint and origin storage both halve compared to separate HLS and DASH segment sets.

Manifest caching: the special case

Segments are essentially immutable (once produced, they never change), so caching them is trivial — a year-long max-age is reasonable. Manifests are different:

  • VOD manifests are written once when the video is published and rarely change. Long max-age (hours to days) is fine.
  • Live manifests are rewritten constantly as new segments are appended. They must have a max-age shorter than the segment duration, or players will play stale manifests pointing to old segments.

A typical live HLS configuration: segments cached for 1 hour, manifests cached for 1 second (or even no-cache). The manifest is small (KB), so revalidating it frequently is cheap; the segments are large (MB) and benefit from long caching.

Adaptive bitrate (ABR) in detail

The encoder produces a "bitrate ladder" — typically 5–8 quality levels:

ResolutionTypical bitrateSuitable for
240p300-500 kbpsCellular / weak connections
360p700 kbps - 1 MbpsMobile / moderate bandwidth
480p1.2-2 MbpsSlower fixed connections
720p2.5-4 MbpsStandard HD streaming
1080p5-8 MbpsFull HD
1440p9-13 Mbps2K displays
2160p (4K)15-25 Mbps4K displays

The player downloads the manifest, measures the time to download the first few segments, divides bytes by time to estimate bandwidth, and picks the highest variant whose bitrate is below estimated bandwidth (with safety margin). Subsequent segments refine the estimate. Quality changes happen at segment boundaries to avoid mid-segment decoder reinitialization.

Range requests and byte-range streaming

Some video players use byte-range requests against a single large MP4 file rather than separate segments. The mechanism uses the HTTP Range request header:

GET /video.mp4 HTTP/1.1
Range: bytes=0-1048575

The server returns a 206 Partial Content with the requested byte range. The player asks for additional ranges as it plays. This works with progressive MP4 files where the moov atom (the index) is placed at the start of the file (fast-start MP4).

CDNs cache range responses as either (a) the entire file with range serving from the cache or (b) range-keyed cache entries. Mode (a) is more cache-efficient but requires fetching the full file even when the user only watches the first minute.

Live streaming and origin shielding

Live streams have a unique caching challenge: every new segment is requested simultaneously by every active viewer the moment the manifest references it. Without origin shielding (see tiered caching), every edge POP would request the new segment from origin at once — a thundering herd. With shielding, the first request to each region's shield triggers a single origin fetch, the shield holds the segment, and every other edge POP fetches from the shield.

Modern live workflows lean even harder on this: the CDN's edge polls the origin manifest, pulls new segments proactively, and pushes them to shields before viewers request them. The first viewer of a new segment hits a warm shield, not a cold origin.

Low-latency streaming

Traditional HLS/DASH has 15–30 second end-to-end latency. Low-latency variants attack this with several techniques:

  • Shorter segments — 1–2 seconds instead of 6.
  • CMAF chunks — segments are subdivided into ~500ms chunks that are pushed over chunked transfer encoding as soon as encoded; players begin decoding before the full segment finishes.
  • HTTP/2 server push / preload hints — tells the player to fetch upcoming segments before the manifest references them.
  • LL-HLS partial segment requests — players request "next part of segment 47" rather than waiting for segment 47 to complete.

End-to-end latency drops to 2–5 seconds. At the cost of more complex encoder, edge, and player configuration plus reduced cache efficiency (more, smaller cache entries).

DRM and tokenization

Premium video adds DRM (Widevine, FairPlay, PlayReady) and signed-URL tokenization. From the CDN's perspective:

  • Segment content stays cacheable. The video bytes are the same regardless of who downloads them; only the license is per-user.
  • Manifests may include tokenized segment URLs with per-user signing keys and expirations. The CDN strips or normalizes the token before computing the cache key — otherwise every user has a unique cache key.
  • License servers are not on the CDN path. The player fetches a license from a separate server (often not cached at all) and uses it to decrypt segments locally.

Common pitfalls

  • Caching the manifest as long as the segments. Players play stale playlists pointing at the wrong segments. Live streams break entirely.
  • Per-user signed URLs in the cache key. Cache hit rate drops to zero. Sign the manifest URL only; let segment URLs be either unsigned or signed with a token the CDN strips from the cache key.
  • Mixed-language audio in one manifest. Without proper alternate audio rendition declarations, players may download all audio tracks instead of one, multiplying bandwidth.
  • HTTP/1.1 head-of-line blocking on browsers that cap concurrent connections per origin. Modern HTTP/2 and HTTP/3 fix this; legacy clients may stall.
  • Segment durations that don't divide evenly into manifest refresh intervals, causing players to fetch the manifest more often than necessary.

Frequently Asked Questions

What is the difference between HLS and DASH?

Both HLS and DASH are HTTP-based adaptive streaming protocols built on the same idea: split video into small segments, list them in a manifest, let the player pick bitrate adaptively. HLS uses .m3u8 playlists and .ts or fMP4 segments; it was created by Apple and is mandatory on iOS. DASH uses XML .mpd manifests and fMP4 segments; it is an open ISO standard. Modern deployments often produce both by encoding once into fMP4 and generating both manifest types over the same segment files (CMAF).

How does adaptive bitrate (ABR) work?

The origin encodes each video at multiple bitrates (e.g., 400 kbps, 1 Mbps, 3 Mbps, 6 Mbps). The manifest lists all variants. The player measures network throughput between segment downloads and picks the highest bitrate it can sustain without buffering. As network conditions change, it switches up or down at segment boundaries. The whole process is client-side; the CDN just serves whichever segments the player requests.

Are video segments cached at the CDN?

Yes — segment caching is the entire point of CDN video delivery. Segments are immutable once produced (a 6-second chunk of video does not change), so they can be cached with very long TTLs and high hit rates. Manifests are cached too but with much shorter TTLs because they update with new segments for live streams or rare edits for VOD.

What is CMAF?

Common Media Application Format. A fragmented MP4 container format designed to be playable as both HLS segments and DASH segments. With CMAF, an origin encodes once and serves the same segment files for both protocols, only the manifest differs. Earlier deployments required separate .ts (HLS) and fMP4 (DASH) segment sets, doubling storage and cache footprint. CMAF eliminates that duplication.

Why does live streaming have higher latency than viewers expect?

Traditional HLS and DASH require the encoder to produce a full segment (typically 4-6 seconds) before publishing it, the player to download a few segments before playback starts, and the CDN to propagate the new segments through cache layers. End-to-end glass-to-glass latency is typically 15-30 seconds. Low-latency variants (LL-HLS, LL-DASH) use chunked transfer and CMAF chunks of ~500 ms to reduce this to 2-5 seconds at the cost of more complex client and edge configuration.

Related Guides

More From This Section