Cache Hit Ratio Explained

Cache hit ratio is the single most important operational metric for any CDN deployment. It captures, in one number, how often the edge served a user without contacting the origin — and therefore how much latency was saved, how much origin bandwidth was avoided, and how much CPU and database load the origin escaped. Most teams quote a single number, but there are at least two distinct things that get called "hit ratio" and they often differ by 10 or 20 percentage points. Understanding which one matters for what decision is the first step toward actually moving the number.

Request hit rate vs byte hit rate

The two canonical definitions:

Metric	Numerator	Denominator	What it measures
Request hit ratio	Requests served from cache	Total requests	User-perceived latency improvement
Byte hit ratio	Bytes served from cache	Total bytes served	Origin bandwidth savings

They diverge when miss responses are larger than hit responses. A site with a 95% request hit rate where the 5% misses are 100 MB video files might have a 60% byte hit rate. The same site looks great by one metric and mediocre by the other.

Pick the right one for the decision:

For latency / TTFB: request hit rate.
For egress costs from origin: byte hit rate.
For overall efficiency: report both.

What the denominator excludes

Hit ratio counts only requests the CDN considered cacheable. Most CDNs exclude two categories:

Requests with no Cache-Control or with no-store — counted as PASS, not as MISS. PASS is excluded from the denominator on some CDNs, included on others.
Non-GET methods — POST, PUT, DELETE are not cacheable by default and are excluded entirely.

If your CDN reports an 80% hit rate but you suspect heavy uncacheable traffic, ask whether PASS traffic is in the denominator. The same workload can produce 80% or 95% hit rate depending on the convention.

Why hit ratios are bimodal

Real-world CDN traffic almost never sits in the middle of the hit-rate distribution. Either you have a hit ratio near 90%+ or near 30-50%. The bimodality comes from a few thresholds:

Are responses cacheable at all? If most responses lack Cache-Control, the CDN passes through everything and hit ratio is near 0.
Is the cache key fragmented? If tracking parameters or cookies are part of the key, hit ratio per URL is near 0 even when responses are cacheable.
Is the working set small relative to cache size? If the popular content fits in cache, hit ratio is high. If you have millions of long-tail URLs and a small cache, evictions cap hit ratio.

Each of these is a binary failure mode. Once all three are resolved, hit ratio jumps to the high range almost immediately.

The cache hierarchy and where hits happen

On a tiered CDN, "hit" can mean different things depending on which layer served the response. The hierarchy from a user's perspective:

Browser cache hit — no network at all. Not counted in CDN metrics.
Edge POP cache hit — served by the POP nearest the user. Lowest possible network latency. This is what most CDNs report as a hit.
Tier-1 / shield cache hit — the edge POP missed, asked the regional shield, the shield had it. Slightly higher latency than edge hit; still much faster than origin. Often reported separately or rolled into "hit."
Origin fetch (miss) — the request reached origin. The slowest path.

For accurate analysis, distinguish edge hits from shield hits. A 90% overall hit rate that's 70% edge + 20% shield is fundamentally different from 90% edge. The edge-only number drives latency; the combined number drives origin offload.

Levers that move hit ratio

The top tactical changes, in approximate order of impact:

Change	Typical hit-rate gain	Risk
Strip UTM/fbclid/gclid from cache key	+10-30 percentage points	None (these params don't affect output)
Add Cache-Control to cacheable responses	+10-40 pp	Low (verify content is truly cacheable)
Configure cookies to be ignored on static paths	+5-20 pp	Low for static assets
Enable tiered caching / origin shielding	+3-10 pp (overall, counting shield hits)	None
Lengthen max-age for static assets	+2-8 pp	Low if assets are versioned
Enable stale-while-revalidate	+1-5 pp (depends on counting)	Low for non-critical data
Normalize Accept-Encoding / Vary	+1-5 pp	Verify content negotiation still works

Most deployments under 70% hit rate have the first two issues. Fix those before pursuing optimizations that recover single-digit percentages.

The diminishing-returns curve

Moving from 50% to 70% hit rate is straightforward and high-impact. Moving from 90% to 95% requires precision tuning. Moving from 95% to 98% might not be possible without architectural changes (more cache memory at the edge, prefetching, longer TTLs).

The right target depends on traffic composition. For a static-asset-heavy site, 95%+ is the bar. For a SaaS application with many per-user views, 60% might be the realistic ceiling because most requests are inherently uncacheable. Don't chase a number that the workload cannot support.

Measuring hit ratio correctly

Common measurement mistakes:

Looking at a global average that hides per-URL behavior. One catastrophically uncacheable URL can drag down the average. Group by path prefix or content type.
Measuring at the wrong time scale. Cache fills take time; a deploy that invalidates cache may show 5 minutes of low hit rate as the cache rewarms. Smooth over windows large enough to be meaningful (1+ hours).
Treating cold-POP traffic as steady-state. A POP that just opened has a cold cache and a low hit rate until warm. CDN-wide averages mask this.
Comparing across content types as if they should hit equally. APIs, HTML, images, video, and JS bundles have different inherent ceilings. Compare like to like.

Hit ratio and cost

The financial impact of moving hit rate, holding traffic constant:

Origin egress cost scales with miss bytes. Doubling miss bytes from 5% to 10% of traffic doubles origin egress charges.
CDN egress cost is the same regardless of hit ratio — the CDN charges you for bytes delivered to users, not bytes fetched from origin. So hit ratio improvements don't reduce CDN bills directly; they reduce origin bills.
Origin compute cost falls with hit ratio if your origin spins up workers per request. Cache hits never reach origin compute.

The right comparison: cost of one origin fetch (egress + compute + DB) vs cost of one CDN edge hit. The ratio is usually 50–500×. Even modest hit-rate improvements pay for the engineering effort to get them.

Diagnostic checklist when hit ratio is low

What percentage of responses set Cache-Control with a positive max-age? If less than 80%, that is the bottleneck.
Are query strings included in the cache key? Sample a high-traffic URL and check distinct keys per day.
Does the response include a Set-Cookie header on what should be cacheable content? Some CDNs bypass cache automatically on Set-Cookie responses.
Does Vary list more than Accept-Encoding? If so, justify each additional header.
Is the CDN's tiered cache or origin shield enabled?
What is the cache eviction rate vs the cache fill rate? High evictions mean the working set is larger than cache.

Each answered question either fixes the problem or rules out a cause. Within an afternoon of focused diagnosis, most teams can identify which of the three failure modes they are hitting and ship a targeted fix.

Frequently Asked Questions

What is cache hit ratio?

Cache hit ratio is the percentage of requests (or bytes) that the CDN edge served from its cache without contacting the origin. A 90% request hit ratio means 9 out of 10 requests were satisfied at the edge; only 1 out of 10 required an origin fetch. Higher is better — it reduces user latency, origin load, and bandwidth costs.

What is the difference between request hit rate and byte hit rate?

Request hit rate counts each request once regardless of response size. Byte hit rate weighs each request by the size of the response. Byte hit rate is what determines origin bandwidth savings; request hit rate is what determines latency improvement. They can differ significantly: a site might have 95% request hit rate but only 70% byte hit rate if its rare cache misses are very large files.

What is a good cache hit ratio?

For purely static content (versioned assets, images), 95%+ is achievable and expected. For a mix of static and dynamic content, 80-90% is realistic. Below 60% almost always indicates a configuration problem — usually cache-key fragmentation from query parameters, missing Cache-Control headers on cacheable responses, or unintended cookies bypassing the cache.

How can I improve cache hit ratio?

The biggest wins typically come from (1) stripping tracking query parameters from the cache key, (2) setting explicit Cache-Control headers on responses that lack them so the CDN caches them, (3) configuring cookies to be ignored unless they genuinely affect the response, (4) using stale-while-revalidate so background refreshes count as hits to users, and (5) enabling tiered caching so cold-cache POPs hit a regional shield instead of origin.

Does cache hit ratio account for revalidation 304s?

Different CDNs count this differently. Some count a 304 revalidation as a hit because the cached body was served to the user; others count it as a miss because origin was contacted. Check your CDN's documentation. Either way, a 304 is much cheaper than a full origin fetch, so the distinction matters less than overall traffic to origin.

Run a Speed Test

Related Guides

Cache Key and Vary

The single biggest source of hit-rate loss.

Cache-Control Headers

The directives that drive what gets cached at all.

Tiered Caching

How shield hits factor into combined hit rate.

CDN Logs and Observability

How to drill into hit-rate metrics by URL and content type.