CDN Logs and Observability
Every CDN problem ends with someone reading logs. The aggregated dashboards tell you whether a metric moved; the raw access logs tell you why. Knowing which fields each CDN exposes, which response headers describe cache disposition, and how to slice the data by URL, POP, and content type is what separates "the CDN is broken" from "this path has 70% of its requests carrying a UTM parameter that fragments the cache key." This guide covers the log fields and headers that exist across every major CDN and the diagnostic patterns that use them.
The cache-status response headers
Every response from a CDN carries one or more headers indicating what happened at the edge. There is now a standardized form (RFC 9211 Cache-Status) plus per-vendor headers that pre-date it:
| Header | Used by | Typical values |
|---|---|---|
Cache-Status (RFC 9211) | Any RFC-compliant cache | "edge"; hit, "edge"; fwd=miss; stored |
cf-cache-status | Cloudflare | HIT, MISS, EXPIRED, BYPASS, REVALIDATED, UPDATING, STALE, DYNAMIC |
x-cache | Fastly, CloudFront, others | HIT, MISS, REFRESH_HIT (Fastly); Hit/Miss from cloudfront (CF) |
age | Universal | Seconds since the response was cached |
x-served-by | Fastly | List of cache servers in the response path |
x-cache-hits | Fastly | Per-server hit count, e.g., 0, 1 |
Inspecting these in DevTools or via curl -I is the fastest way to confirm whether a request actually hit the edge cache. If you see MISS or DYNAMIC on a URL you expect to be cached, the next steps are response headers (Cache-Control, Vary, Set-Cookie) and cache-key configuration.
Standard log fields
Every CDN exposes broadly the same access-log schema. Field names differ but the categories are universal:
| Category | Common fields |
|---|---|
| Request identity | timestamp, request_id, client_ip, country, asn |
| Request shape | method, scheme, host, path, query, protocol_version |
| Response shape | status, response_bytes, content_type |
| Cache disposition | cache_status, age, ttl, cache_key |
| Timing | time_to_first_byte, total_time, origin_time (if miss) |
| Edge identity | pop, datacenter, server_id |
| Client headers | user_agent, referrer, accept_encoding, accept_language |
| Security | tls_version, cipher, threat_score, waf_action |
For most diagnostic work, the indispensable fields are path, query, cache_status, status, response_bytes, and pop.
Log delivery: real-time vs batched
CDNs offer logs at multiple tiers:
- Real-time stream — push to an HTTP endpoint, Kafka topic, or syslog target with sub-second delivery. Used for live anomaly detection, alerting, and SOC pipelines.
- Near-real-time S3 / GCS / Azure Blob delivery — files written every 1–15 minutes containing all requests in that window. Used for long-term storage and offline analysis.
- Aggregated dashboards — pre-computed counts, rates, and percentiles. Updated every few seconds to a minute. Used for at-a-glance status.
Raw log delivery often has a separate cost (per-million-events or per-GB). For cost-sensitive deployments, sample the logs — 1% sampling preserves trend analysis at 1% of the cost.
Diagnostic pattern 1: hit-rate by URL prefix
When overall hit rate is low, group requests by URL prefix to find the offenders:
SELECT
regexp_extract(path, '^(/[^/]+)') AS prefix,
count(*) AS requests,
sum(case when cache_status='HIT' then 1 else 0 end)*1.0/count(*) AS hit_rate
FROM cdn_logs
WHERE day = '2026-05-29'
GROUP BY prefix
ORDER BY requests DESC
LIMIT 20
The highest-volume prefixes with the lowest hit rates are the right places to invest configuration time. A 70% hit rate path under /api may be normal; a 20% hit rate on /static/ is a bug.
Diagnostic pattern 2: cache-key fragmentation
For a specific URL that should be cached but isn't, count distinct cache keys:
SELECT
cache_key,
count(*) AS requests
FROM cdn_logs
WHERE path = '/featured-products'
AND day = '2026-05-29'
GROUP BY cache_key
ORDER BY requests DESC
LIMIT 50
If you see hundreds or thousands of distinct keys for one URL, the cache key includes something it shouldn't. Compare keys side-by-side to spot the differing component (a tracking parameter, a cookie value, a varying header).
Diagnostic pattern 3: origin pressure during traffic spikes
When origin alerts fire during a traffic spike, the question is whether the spike exceeded cache capacity or whether it hit uncacheable URLs:
SELECT
date_trunc('minute', timestamp) AS minute,
sum(case when cache_status IN ('MISS','EXPIRED','PASS') then 1 else 0 end) AS origin_requests,
sum(case when cache_status='HIT' then 1 else 0 end) AS edge_hits
FROM cdn_logs
WHERE day = '2026-05-29'
GROUP BY minute
ORDER BY minute
If the spike in origin_requests is dominated by PASS, the spike traffic was uncacheable; configure those URLs or accept the load. If MISS dominates, you have cacheable URLs cold-loading; pre-warming or tiered caching is the fix.
POP-level diagnostics
Anycast routing usually works, but pathological cases exist: a user in Tokyo whose ISP routes through Frankfurt, a POP that just opened with a cold cache, a backbone outage that redirects traffic to a suboptimal POP. The pop field in logs reveals these:
- Per-country POP distribution. Most users in country X should hit POPs near X. If 40% of US users hit a European POP, anycast routing is misconfigured or there's an ISP-level routing issue.
- Per-POP hit rate. Cold POPs lag warm POPs by hours after a new launch or major cache eviction. If hit rate per POP varies wildly, investigate.
- Per-POP latency. If one POP has p95 TTFB 5x the average, the POP itself or its origin path is degraded.
Sampling strategy
At high traffic, full log volume is unmanageable. Sampling helps if done correctly:
- Random sampling (1 of N requests). Preserves rate distributions; loses rare events.
- Anomaly-only sampling (errors, slow responses, specific status codes). Preserves outliers; loses baseline.
- Hash-based sampling on a stable key (e.g., client IP modulo N). Preserves the experience of any specific subset of users end-to-end.
For most teams, hash-based sampling on request ID plus full capture of any 5xx is the right balance.
Alerting on CDN data
The signals worth paging on:
- Origin 5xx rate above baseline — usually origin trouble, not CDN.
- Edge 5xx rate (CDN's own error responses) — CDN infrastructure issue.
- Cache hit ratio dropping more than X% week-over-week — likely a config regression or content change.
- Per-POP error spikes — localized issue, may be ISP- or routing-related.
- Suddenly long p99 TTFB — may indicate origin slowness on cache misses.
Avoid paging on raw request rate alone; legitimate traffic spikes are not problems if cache absorbs them. Page on origin-impacting events, not edge-absorbed ones.
Frequently Asked Questions
What is in a CDN access log?
A CDN access log records one entry per request that hit the edge. Typical fields: timestamp, client IP, request method, host, path, query string, response status, response size, cache status (HIT/MISS/EXPIRED/PASS), POP identifier, time to first byte, total response time, user agent, referrer, and request headers used in the cache key. Many CDNs let you customize the field set.
What is the standard Cache-Status header?
RFC 9211 defines a standardized Cache-Status response header that any cache in the chain can append. Each cache adds a token with its identity and the disposition for that request — hit, fwd=miss, fwd=stale, etc. Modern CDNs are starting to emit Cache-Status alongside their vendor-specific headers, which lets debugging tools parse cache behavior portably.
How do I tell which POP served my request?
Every major CDN emits a header identifying the POP that handled the request. Cloudflare uses cf-ray (the second segment is the POP code), Fastly uses x-served-by (a list of cache servers in the path), AWS CloudFront uses x-amz-cf-pop. The header tells you which edge location served the response — useful for confirming anycast routing, geo-targeting, and cross-region debugging.
How quickly do logs become available?
It varies by tier. Real-time analytics aggregates (request counts, hit rates) typically update every 1-60 seconds. Per-request logs may stream within seconds (push to S3, Kafka, HTTP endpoints) or batch every 1-15 minutes depending on the plan. Some CDNs charge separately for raw log delivery; basic aggregated metrics are usually included.
What metrics should I monitor from CDN logs?
At minimum: cache hit ratio (overall and by content type), 5xx error rate from origin, p50/p95/p99 time to first byte at the edge, top-N URLs by request volume, top-N URLs by cache-miss volume, distribution of cache statuses (HIT/MISS/PASS/REVALIDATED), and per-POP request distribution. These together cover performance, reliability, and cost.
Related Guides
More From This Section
All CDN & Edge Guides
How CDNs work, cache headers, anycast, edge functions, and security.
Anycast vs GeoDNS
Anycast and GeoDNS compared — how each routes users to CDN points of presence, BGP convergence, GeoDNS resolver…
Cache Hit Ratio Explained
What cache hit ratio actually measures, the difference between request and byte hit rate, and the configuration changes…
Run a Speed Test
Measure download, upload, ping, and jitter in your browser.