learn/System Design/Rate Limiting
ReliabilityIntermediate interactive

Rate Limiting

Cap request rates to protect your service.

token_bucket
6/6 tokens
bucket · refills +1/0.9s
request log

Each request spends a token. Empty bucket → 429. Bursts allowed up to capacity.

How it works

Rate limiting caps how many requests a client can make in a window, protecting services from abuse and overload. Token bucket, leaky bucket, and sliding-window counters each balance burst tolerance against strictness.

Mental models

  • Token bucket refills at a fixed rate and allows controlled bursts.
  • Leaky bucket queues requests and drains at a constant rate, smoothing output.
  • Fixed window is cheap but lets a boundary burst hit up to 2× the limit.
  • Sliding window counts the trailing interval — fairer, but tracks more state.
  • Limits live at the edge (API gateway) keyed by IP, user, or API key.

Reach for it when

  • API abuse prevention
  • Fair multi-tenant usage
  • DDoS mitigation