Tiered Caching Explained

A flat CDN has hundreds of edge POPs, each independently caching from origin. When a piece of content is requested in 100 regions, the origin sees 100 separate fetches — one per cold-cache POP. Tiered caching breaks that pattern by inserting a smaller number of intermediate caches between the edges and the origin. Each tier absorbs misses from the tier below it, so origin sees one fetch per shield, not one per edge. The result, on most workloads, is order-of-magnitude reductions in origin request volume with essentially no downside.

The flat CDN problem

Consider a popular blog post that goes viral. It's requested from 200 edge POPs across the world within a minute. Without tiering, the request flow looks like this:

Edge POP 1 (cold) → Origin
Edge POP 2 (cold) → Origin
Edge POP 3 (cold) → Origin
...
Edge POP 200 (cold) → Origin

Origin sees 200 simultaneous requests for the same resource. Once each POP caches its copy, subsequent requests are absorbed at the edge — but the cold-fill burst is unavoidable without an intermediate tier. For static content this is a brief blip; for dynamic-but-cacheable content with frequent re-validations, the pattern repeats every TTL.

What tiered caching does

Add a middle layer:

Edge POP 1 (cold) → Shield POP → Origin
Edge POP 2 (cold) → Shield POP (cached) → return cached
Edge POP 3 (cold) → Shield POP (cached) → return cached
...

Only the first edge POP to miss triggers a shield fetch. The shield fetches from origin once, caches, and serves the same response to the next 199 edge POP requests. Origin sees 1 fetch per piece of content per region instead of 200.

Tier counts and topology

The simplest design is two tiers (edge + shield). Larger CDNs use three:

Tier	Count	Role
Edge	Hundreds of POPs	Serve users; small per-POP cache footprint
Mid / regional	10-30 regional caches	Aggregate misses across nearby edges; larger cache
Shield / origin-facing	1-3 shields per origin	Final layer; designed to minimize unique requests to origin

The exact tier count is invisible to users — every tier is just "the CDN cache" from the response side. The configuration knob a customer typically controls is whether shielding is on at all, and optionally which shield POP(s) to use.

How tiered caching interacts with hit ratio

Tiered caching does not change the user-perceived hit rate at the edge — an edge hit is still an edge hit. But it changes what happens on a miss. On a flat CDN, a miss is one origin fetch. With tiering, a miss might be:

A shield hit — edge missed, shield served. No origin contact. Latency: edge → shield RTT.
A shield miss — edge missed, shield missed. Single origin fetch. Latency: edge → shield → origin.

The combined hit ratio (edge hits + shield hits) is what determines origin offload. The edge-only hit ratio is what determines user latency. Most CDN dashboards expose both; for cost analysis, use the combined number.

Latency implications

On cache hits at the edge, tiered caching adds zero latency. On misses, it adds one extra network hop. The hop's cost depends on shield placement:

If shields are colocated with origin (e.g., shield in the same region as origin): the shield → origin hop is near-zero. The edge → shield hop replaces the edge → origin hop with similar latency.
If shields are spread regionally: edge → regional shield is fast; shield → origin pays the cross-region cost. On shield miss, this is worse than direct edge → origin.

Most CDNs default to placing shields close to origin to keep the worst-case miss latency similar to the no-shield case. Verify your CDN's specific behavior — some default to a single global shield that may be on a different continent.

Memory and storage trade-offs

Edge POPs have limited cache memory (cost matters at 300+ POPs). They evict aggressively. Shields have larger caches because there are fewer of them. The consequence:

An edge POP may evict a moderately-popular item that a shield still has cached. The next edge miss is absorbed by the shield, not by origin.
For long-tail content where each item is requested in only one region, tiered caching helps less — the edge POP is the only one that ever sees that item, and the shield miss is the same as a direct origin fetch (plus one hop).
For widely-requested content, tiered caching is most effective because the shield amortizes one origin fetch across many edge cache fills.

Concurrent request collapsing

A second mechanism, often bundled with tiered caching: request collapsing (also called coalescing). When many edge POPs request the same cache key from the shield simultaneously, the shield does not fire that many origin fetches in parallel. It fires one, holds the other requests, and serves them all from the response when it arrives.

Combined with tiering, request collapsing means a flash-traffic event on a single URL produces exactly one origin fetch in the worst case — no matter how many edge POPs simultaneously want the content. Without collapsing, even with tiering, you'd see one fetch per concurrent edge request that arrives before the shield's fill completes.

Per-CDN naming and behavior

CDN	Feature name	Default state
Cloudflare	Tiered Cache (Smart Tiered Cache, Custom Tiered Cache)	Off by default on free plans; on by default on paid plans
AWS CloudFront	Regional Edge Caches	Automatic, always on
Fastly	Origin Shielding	Configurable per service
Akamai	Tiered Distribution	Available on most product tiers
Google Cloud CDN	(Implicit in global infrastructure)	Implicit

Pricing for tiered caching varies: some CDNs bundle it with the standard product, some treat it as a paid add-on. The breakeven analysis is usually trivial because the origin egress savings dwarf the CDN-side cost.

When tiered caching is the wrong tool

Long-tail content with no repeat requests across regions. The shield rarely has what edges ask for; it adds a hop with no benefit.
Per-user personalized responses with no shared cache key across users. Tiering caches don't have the content because no two requests share a key.
Origins that are themselves very close to all edges (e.g., a multi-region origin deployment). The shield's role is redundant.
Workloads where origin can absorb the load comfortably. The added configuration complexity isn't worth optimizing for.

For everything else — and that is the vast majority of real CDN deployments — tiered caching is one of the few CDN settings that's almost always correct to enable.

Live video: the tiered-caching showcase

Live video streaming is the workload where tiering helps most. A live segment is requested by every active viewer within seconds of being published. Without tiering, every edge POP serving any active viewer would request the segment from origin the instant the manifest references it — potentially tens of thousands of simultaneous origin requests per segment.

With tiering plus request collapsing, the request count to origin per segment is the number of shield POPs — usually a handful. The fan-out happens shield-to-edges, not origin-to-edges. Even very large live events (Super Bowl, World Cup) can be delivered from a single-server origin if the CDN tier absorbs the fan-out.

Verifying tiered caching is working

Check the CDN's per-response cache-status header for a tier identifier. Cloudflare exposes this in cache-status; Fastly uses x-served-by showing the cache chain.
Compare origin request rate before and after enabling. A correctly configured tier reduces origin RPS by a factor of 3–20× depending on workload.
Track shield hit rate as a distinct metric from edge hit rate. Both contribute to origin offload but they represent different cache-fill behaviors.
Watch p99 miss latency. If it increased after enabling tiering, your shields may be poorly placed.

Frequently Asked Questions

What is tiered caching?

Tiered caching is a CDN architecture where edge POPs do not contact the origin directly on cache miss. Instead, they ask a regional shield POP — a designated upper-tier cache. The shield holds a larger pool of cached content; only its own misses reach the origin. The effect is that origin sees one request per piece of content per region, not one per edge POP that wanted that content.

What is origin shielding?

Origin shielding is the simplest form of tiered caching: one designated shield POP sits between every edge POP and the origin. Some CDNs let you pick the shield POP; others use multiple regional shields. The mechanism is the same — collapse simultaneous edge misses into a single upstream request.

When does tiered caching matter most?

Tiered caching helps most for content that is widely requested across many regions and whose cache eventually expires across all of them. Without tiering, every POP independently re-fetches from origin when its cache expires. With tiering, only one fetch per shield happens. Live video, news sites, and any flash-traffic pattern see the largest origin-offload benefits.

Does tiered caching add latency?

On cache misses, yes — the edge POP makes a hop to the shield instead of directly to origin. If the shield is closer to origin than the edge POP is, the total latency is similar; if the shield is farther from origin than the edge, latency increases on misses. On cache hits at the edge (the vast majority of requests), there is zero latency impact. Net effect is almost always positive because the tier deflects most origin fetches entirely.

How is tiered caching different from a regional cache?

They are often the same thing under different names. AWS CloudFront uses the term "regional edge cache" for what other CDNs call a "tier-1 cache" or "shield POP." The mechanics are identical: a larger, fewer-in-number tier of caches sits between edge POPs and origin. Different CDNs implement automatic vs configurable shielding, but the architectural pattern is universal.

Run a Speed Test

Related Guides

How a CDN Works

The POP and origin model that tiered caching sits on top of.

Cache Hit Ratio

How shield hits affect overall offload metrics.

Video Streaming via CDN

The workload where tiered caching matters most.

Anycast vs GeoDNS

How users are routed to the edge POPs above the shield layer.

Tiered Caching Explained

The flat CDN problem

What tiered caching does

Tier counts and topology

How tiered caching interacts with hit ratio

Latency implications

Memory and storage trade-offs

Concurrent request collapsing

Per-CDN naming and behavior

When tiered caching is the wrong tool

Live video: the tiered-caching showcase

Verifying tiered caching is working

Frequently Asked Questions

What is tiered caching?

What is origin shielding?

When does tiered caching matter most?

Does tiered caching add latency?

How is tiered caching different from a regional cache?

Related Guides

How a CDN Works

Cache Hit Ratio

Video Streaming via CDN

Anycast vs GeoDNS

More From This Section

All CDN & Edge Guides

Anycast vs GeoDNS

Cache Hit Ratio Explained

Run a Speed Test