CDN Logs and Observability

Every CDN problem ends with someone reading logs. The aggregated dashboards tell you whether a metric moved; the raw access logs tell you why. Knowing which fields each CDN exposes, which response headers describe cache disposition, and how to slice the data by URL, POP, and content type is what separates "the CDN is broken" from "this path has 70% of its requests carrying a UTM parameter that fragments the cache key." This guide covers the log fields and headers that exist across every major CDN and the diagnostic patterns that use them.

The cache-status response headers

Every response from a CDN carries one or more headers indicating what happened at the edge. There is now a standardized form (RFC 9211 Cache-Status) plus per-vendor headers that pre-date it:

Header	Used by	Typical values
`Cache-Status` (RFC 9211)	Any RFC-compliant cache	`"edge"; hit`, `"edge"; fwd=miss; stored`
`cf-cache-status`	Cloudflare	HIT, MISS, EXPIRED, BYPASS, REVALIDATED, UPDATING, STALE, DYNAMIC
`x-cache`	Fastly, CloudFront, others	HIT, MISS, REFRESH_HIT (Fastly); Hit/Miss from cloudfront (CF)
`age`	Universal	Seconds since the response was cached
`x-served-by`	Fastly	List of cache servers in the response path
`x-cache-hits`	Fastly	Per-server hit count, e.g., `0, 1`

Inspecting these in DevTools or via curl -I is the fastest way to confirm whether a request actually hit the edge cache. If you see MISS or DYNAMIC on a URL you expect to be cached, the next steps are response headers (Cache-Control, Vary, Set-Cookie) and cache-key configuration.

Standard log fields

Every CDN exposes broadly the same access-log schema. Field names differ but the categories are universal:

Category	Common fields
Request identity	timestamp, request_id, client_ip, country, asn
Request shape	method, scheme, host, path, query, protocol_version
Response shape	status, response_bytes, content_type
Cache disposition	cache_status, age, ttl, cache_key
Timing	time_to_first_byte, total_time, origin_time (if miss)
Edge identity	pop, datacenter, server_id
Client headers	user_agent, referrer, accept_encoding, accept_language
Security	tls_version, cipher, threat_score, waf_action

For most diagnostic work, the indispensable fields are path, query, cache_status, status, response_bytes, and pop.

Log delivery: real-time vs batched

CDNs offer logs at multiple tiers:

Real-time stream — push to an HTTP endpoint, Kafka topic, or syslog target with sub-second delivery. Used for live anomaly detection, alerting, and SOC pipelines.
Near-real-time S3 / GCS / Azure Blob delivery — files written every 1–15 minutes containing all requests in that window. Used for long-term storage and offline analysis.
Aggregated dashboards — pre-computed counts, rates, and percentiles. Updated every few seconds to a minute. Used for at-a-glance status.

Raw log delivery often has a separate cost (per-million-events or per-GB). For cost-sensitive deployments, sample the logs — 1% sampling preserves trend analysis at 1% of the cost.

Diagnostic pattern 1: hit-rate by URL prefix

When overall hit rate is low, group requests by URL prefix to find the offenders:

SELECT
  regexp_extract(path, '^(/[^/]+)') AS prefix,
  count(*) AS requests,
  sum(case when cache_status='HIT' then 1 else 0 end)*1.0/count(*) AS hit_rate
FROM cdn_logs
WHERE day = '2026-05-29'
GROUP BY prefix
ORDER BY requests DESC
LIMIT 20

The highest-volume prefixes with the lowest hit rates are the right places to invest configuration time. A 70% hit rate path under /api may be normal; a 20% hit rate on /static/ is a bug.

Diagnostic pattern 2: cache-key fragmentation

For a specific URL that should be cached but isn't, count distinct cache keys:

SELECT
  cache_key,
  count(*) AS requests
FROM cdn_logs
WHERE path = '/featured-products'
  AND day = '2026-05-29'
GROUP BY cache_key
ORDER BY requests DESC
LIMIT 50

If you see hundreds or thousands of distinct keys for one URL, the cache key includes something it shouldn't. Compare keys side-by-side to spot the differing component (a tracking parameter, a cookie value, a varying header).

Diagnostic pattern 3: origin pressure during traffic spikes

When origin alerts fire during a traffic spike, the question is whether the spike exceeded cache capacity or whether it hit uncacheable URLs:

SELECT
  date_trunc('minute', timestamp) AS minute,
  sum(case when cache_status IN ('MISS','EXPIRED','PASS') then 1 else 0 end) AS origin_requests,
  sum(case when cache_status='HIT' then 1 else 0 end) AS edge_hits
FROM cdn_logs
WHERE day = '2026-05-29'
GROUP BY minute
ORDER BY minute

If the spike in origin_requests is dominated by PASS, the spike traffic was uncacheable; configure those URLs or accept the load. If MISS dominates, you have cacheable URLs cold-loading; pre-warming or tiered caching is the fix.

POP-level diagnostics

Anycast routing usually works, but pathological cases exist: a user in Tokyo whose ISP routes through Frankfurt, a POP that just opened with a cold cache, a backbone outage that redirects traffic to a suboptimal POP. The pop field in logs reveals these:

Per-country POP distribution. Most users in country X should hit POPs near X. If 40% of US users hit a European POP, anycast routing is misconfigured or there's an ISP-level routing issue.
Per-POP hit rate. Cold POPs lag warm POPs by hours after a new launch or major cache eviction. If hit rate per POP varies wildly, investigate.
Per-POP latency. If one POP has p95 TTFB 5x the average, the POP itself or its origin path is degraded.

Sampling strategy

At high traffic, full log volume is unmanageable. Sampling helps if done correctly:

Random sampling (1 of N requests). Preserves rate distributions; loses rare events.
Anomaly-only sampling (errors, slow responses, specific status codes). Preserves outliers; loses baseline.
Hash-based sampling on a stable key (e.g., client IP modulo N). Preserves the experience of any specific subset of users end-to-end.

For most teams, hash-based sampling on request ID plus full capture of any 5xx is the right balance.

Alerting on CDN data

The signals worth paging on:

Origin 5xx rate above baseline — usually origin trouble, not CDN.
Edge 5xx rate (CDN's own error responses) — CDN infrastructure issue.
Cache hit ratio dropping more than X% week-over-week — likely a config regression or content change.
Per-POP error spikes — localized issue, may be ISP- or routing-related.
Suddenly long p99 TTFB — may indicate origin slowness on cache misses.

Avoid paging on raw request rate alone; legitimate traffic spikes are not problems if cache absorbs them. Page on origin-impacting events, not edge-absorbed ones.

Frequently Asked Questions

What is in a CDN access log?

A CDN access log records one entry per request that hit the edge. Typical fields: timestamp, client IP, request method, host, path, query string, response status, response size, cache status (HIT/MISS/EXPIRED/PASS), POP identifier, time to first byte, total response time, user agent, referrer, and request headers used in the cache key. Many CDNs let you customize the field set.

What is the standard Cache-Status header?

RFC 9211 defines a standardized Cache-Status response header that any cache in the chain can append. Each cache adds a token with its identity and the disposition for that request — hit, fwd=miss, fwd=stale, etc. Modern CDNs are starting to emit Cache-Status alongside their vendor-specific headers, which lets debugging tools parse cache behavior portably.

How do I tell which POP served my request?

Every major CDN emits a header identifying the POP that handled the request. Cloudflare uses cf-ray (the second segment is the POP code), Fastly uses x-served-by (a list of cache servers in the path), AWS CloudFront uses x-amz-cf-pop. The header tells you which edge location served the response — useful for confirming anycast routing, geo-targeting, and cross-region debugging.

How quickly do logs become available?

It varies by tier. Real-time analytics aggregates (request counts, hit rates) typically update every 1-60 seconds. Per-request logs may stream within seconds (push to S3, Kafka, HTTP endpoints) or batch every 1-15 minutes depending on the plan. Some CDNs charge separately for raw log delivery; basic aggregated metrics are usually included.

What metrics should I monitor from CDN logs?

At minimum: cache hit ratio (overall and by content type), 5xx error rate from origin, p50/p95/p99 time to first byte at the edge, top-N URLs by request volume, top-N URLs by cache-miss volume, distribution of cache statuses (HIT/MISS/PASS/REVALIDATED), and per-POP request distribution. These together cover performance, reliability, and cost.

Run a Speed Test

Related Guides

Cache Hit Ratio

The headline metric every log query revolves around.

Cache Key and Vary

What logs reveal about key fragmentation.

Cache-Control Headers

The directives whose effects show up in cache_status fields.

Purge and Invalidation

Why logs spike with MISS traffic after a purge.