07 SEPT 2025 · 6 MIN READ
Caching : Thundering Herd and Request Hedging
Caching: It's Not Just About Memory
Myth-busting time : Caching doesn't mean in-memory. I see this confusion everywhere.
We accept data staleness in exchange for avoiding expensive operations. Every time you cache something, you're saying 'I'd rather serve data that might be 5 minutes old than wait 2 seconds for a database query
What Caching Really Means
Cache = saving expensive operations. That's it.
Expensive operations include:
- Database queries with 14 table joins
- Network calls to external services
- Complex computations
- File system reads
You can cache:
- In memory (Redis)
- On disk (API server's unused SSD)
- In browser (localStorage)
- At CDN edge
The Under-utilized Cache Location
Here's something nobody talks about - your API server's disk is sitting there doing nothing.
You spin up an EC2 instance:
- 4GB RAM (fully utilized)
- 20GB SSD (5% utilized)
Why not cache on that SSD? It's faster than network calls to Redis, costs nothing extra, and the space is already paid for.
But wait - multiple API servers means cache inconsistency! Which is why we usually centralize with Redis. But for read-heavy, rarely-changing data? Disk cache works beautifully.
✽ RECALL why is your API server's idle SSD a legitimate cache tier, and what's the catch once you run multiple servers?
caching means saving expensive operations, not "put it in RAM" — and a local disk read beats a network round-trip to redis while using space you've already paid for. the catch: each server's disk cache is its own little world, so multiple API servers drift inconsistent. that's why mutable data centralizes in redis, while read-heavy, rarely-changing data is the sweet spot for disk cache.
what you'll take away
quick pointers so you know what to look for as you read:
- caching ≠ in-memory. a cache is anywhere you save the result of an expensive operation — RAM, disk, browser, CDN edge.
- your API server's idle SSD is a free cache tier. great for read-heavy, rarely-changing data; the catch is per-server inconsistency.
- cache expiry + concurrent traffic = stampede. N identical misses become N identical expensive queries, and the database melts.
- request hedging collapses N misses into 1 query. the first request does the work, everyone else waits on a semaphore.
- waiters read from a temporary result map. re-hitting the cache after the signal would just be a second stampede.
Cache Stampede & Request Hedging
The fundamental question we should always ask is: "What happens when your cache expires and 1000 requests hit simultaneously?"
Answer: Your database dies, your site goes down, and you get paged at 3 AM. Let me show you how to prevent this nightmare scenario.
The Cache Stampede Problem
Picture this: You have a popular blog post cached in Redis. The cache expires. Suddenly, 1000 concurrent requests hit your API at the exact same moment.
What happens?
- All 1000 requests check Redis → cache miss
- All 1000 requests query the database
- Database connection pool gets overwhelmed
- Database melts under load
- Site goes down
- You're now debugging at 3 AM while your users are angry
This is called a cache stampede or thundering herd problem , and it's one of the most common ways high-traffic applications fail.

Why is this so dangerous? Even if you have database connection pooling (which you should), making N identical expensive queries to your database for the same data doesn't make any sense. It's pure waste that can bring down your entire system.
✽ RECALL a hot key expires and 1000 concurrent requests arrive at once. walk the failure chain — and why doesn't connection pooling save you?
all 1000 miss the cache, all 1000 fire the same query at the database, the connection pool saturates, the db melts, the site goes down. pooling only caps concurrency — it doesn't change the fact that N identical expensive queries for the same data is pure waste. the fix isn't more database capacity, it's collapsing the N requests into 1.
The Real-World Impact
This isn't some theoretical problem I'm throwing at you. This is literally what CDNs solve every single day.
Think about it: CloudFlare, AWS CloudFront, and every other CDN faces this exact problem. When a cached resource expires and thousands of requests come in simultaneously, they can't all hit the origin server. The origin would die instantly.
CDNs use sophisticated request hedging to ensure that only ONE request goes to the origin while everyone else waits for that response. This is production-tested at massive scale.
The Solution: Request Hedging (Smart Debouncing)
Here's the elegant solution - and this is literally the pseudo-code you'd write:
# Pseudo-code that would work if you saved this as .py
sem_map = {} # Use thread-safe implementation
res_map = {} # Temporary result storage
def get_blog(k):
# First, try cache
v = cache.get(k)
if v is not None:
return v
# Check if someone else is already fetching this
s = sem_map.get(k)
if s:
s.wait() # Wait for someone else to do the work
v = res_map.get(k) # Get the result they fetched
return v
else:
# I'm the first one - I'll do the work
sem_map[k] = new_semaphore()
sem_map[k].block() # Block others
# Do the expensive work
v = db.get(k)
cache.put(k, v)
res_map[k] = v # Store temporarily for waiting requests
# Signal that I'm done
sem_map[k].signal()
sem_map.remove(k)
return v✽ RECALL in request hedging, what does the first cache-missing request do differently from all the ones behind it?
the first request finds no semaphore for the key, so it creates one, blocks everyone else, does the expensive db fetch, writes the result to the cache and a temporary result map, then signals and removes the semaphore. every later request finds the semaphore, waits, and grabs the value from the result map. one db query, no matter how many concurrent misses pile up.
Implementation Details That Matter
Why the Temporary Result Map?
You might wonder: "Why not just make waiting requests hit the cache again after the signal?"
Because that creates unnecessary load! If everyone waits and then immediately hits the cache again, you've just created another stampede on your cache layer.
The res_map is a temporary local storage (5-minute TTL) that holds the result just long enough for waiting requests to grab it directly. This eliminates the extra cache round-trip.
✽ RECALL after the leader signals, why do waiters read from res_map instead of just hitting the cache again?
because hundreds of requests simultaneously re-hitting the cache is just a second stampede, aimed at the cache layer this time. the temporary local result map holds the value just long enough for the waiters to grab it directly — zero extra round-trips, no new herd.
When You Actually Need This
"I've been using Redis for years and never needed this!"
Fair point. This isn't some academic exercise. You need request hedging when you have:
- High traffic with shared expensive resources
- Cache expiration happening under concurrent load
- Database queries that take >100ms
- Flash sale scenarios or viral content
CDN Use Case (Real-World Example)
CDNs face this constantly:
- Origin: Your S3 bucket or API server
- Cache: CDN edge servers worldwide
- Problem: Popular resource expires, 10,000 requests hit one edge server
- Solution: Only ONE request goes to origin, others wait
This pattern has prevented countless outages for companies you use every day.
✽ RECALL you've run redis for years without request hedging and nothing broke. what combination of conditions changes that?
high traffic on a shared expensive resource, with cache expiry landing under concurrent load — slow db queries (>100ms), flash sales, viral content. CDNs live this every day: a popular resource expires at an edge server, thousands of requests pile up, and exactly one is let through to the origin while the rest wait. if your traffic never concentrates on one expiring key like that, you genuinely don't need it.