Rate Limiting with Redis: How We Handle 100,000 Requests/Day Without Falling Over

Rate limiting is one of those features that looks trivial until you ship it. Then you discover race conditions under load, counters that drift, Redis memory that bloats, and edge cases around plan upgrades mid-window. This post walks through the exact rate limiting pattern powering APIndicators' API: Rails middleware backed by Redis, per-plan daily quotas, atomic counters, and a TTL strategy that keeps Redis lean.

We serve 100,000+ requests per day across our user base with this setup. Redis memory for rate limiting sits under 5 MB. Middleware overhead is under 1ms per request.

Requirements

What a production rate limiter needs to handle:

Atomic increments under concurrent requests (no double-counting or lost updates)
Per-user, per-plan quotas that can differ by subscription tier
Daily reset on UTC boundaries without cron jobs
Graceful degradation if Redis is unreachable
Clear 429 responses with retry-after headers

The Core Pattern: Rails Middleware + Redis INCR

The whole implementation fits in about 60 lines of Ruby. Here is the middleware:

class RateLimitMiddleware
  def initialize(app)
    @app = app
  end

  def call(env)
    request = ActionDispatch::Request.new(env)
    user_id = extract_user_id(request)
    return @app.call(env) unless user_id

    plan = fetch_plan(user_id)
    limit = DAILY_LIMITS[plan]
    key = "rate_limit:#{user_id}:#{Date.current}"

    current = Rails.cache.redis.with do |redis|
      count = redis.incr(key)
      redis.expire(key, 86_400) if count == 1
      count
    end

    if current > limit
      return rate_limited_response(limit, current)
    end

    status, headers, body = @app.call(env)
    headers["X-RateLimit-Limit"] = limit.to_s
    headers["X-RateLimit-Remaining"] = (limit - current).to_s
    [status, headers, body]
  end

  private

  DAILY_LIMITS = {
    "free" => 1_000,
    "starter" => 10_000,
    "pro" => 100_000,
    "enterprise" => 1_000_000
  }.freeze

  def extract_user_id(request)
    token = request.headers["Authorization"]&.sub("Bearer ", "")
    ApiKey.find_by(token:)&.user_id
  end

  def fetch_plan(user_id)
    Rails.cache.fetch("user_plan:#{user_id}", expires_in: 5.minutes) do
      User.find(user_id).subscription&.plan || "free"
    end
  end

  def rate_limited_response(limit, current)
    headers = {
      "Content-Type" => "application/json",
      "X-RateLimit-Limit" => limit.to_s,
      "X-RateLimit-Remaining" => "0",
      "Retry-After" => seconds_until_midnight_utc.to_s
    }
    body = { error: "rate_limit_exceeded", limit:, current: }.to_json
    [429, headers, [body]]
  end

  def seconds_until_midnight_utc
    (Date.current.next_day.to_time.utc - Time.now.utc).to_i
  end
end

config.middleware.insert_before Rack::Head, RateLimitMiddleware

Why This Pattern Works

INCR is atomic. Redis' INCR is a single atomic operation. Two concurrent requests hitting it will always produce the right count. No need for locks, no race conditions.

TTL set only on first increment. The redis.expire(key, 86_400) if count == 1 pattern sets the 24-hour TTL exactly once per key. Subsequent increments do not reset the TTL, so the window rolls forward cleanly.

Keys are date-scoped. Using Date.current in the key means every new UTC day gets a fresh key. The old key dies via TTL. No cron job required.

Plan lookups are cached. We hit the database at most once per 5 minutes per user for plan info. Rate limiting happens entirely in Redis for active users.

Failure Modes and How We Handle Them

Redis is down. If Rails.cache.redis.with raises, we currently let it through (fail-open). Alternatives: fail-closed (reject all requests) or fall back to a local in-memory counter (risk drift across app instances). Fail-open is the right tradeoff for us; rate limiting matters less than uptime.

def call(env)
  rate_check(env)
rescue Redis::BaseError => e
  Sentry.capture_exception(e)
  @app.call(env)
end

User upgrades mid-window. The plan cache expires in 5 minutes, so upgrades propagate quickly. The existing counter is not reset — the user just gets the higher limit for the rest of the day.

Hot keys. Pro users hitting 100k/day create hot keys. Redis Cluster sharding can help if you scale further, but 100k INCRs per day is well within single-node Redis capacity (Redis handles 1M+ ops/sec on modest hardware).

Benchmarks

We benchmarked the middleware with wrk against a local Rails + Redis setup:

Running 30s test @ http://localhost:3000/v1/pairs/BTCUSDT/indicators
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    42.15ms   18.23ms  187ms   77.32%
    Req/Sec   292.18     47.52    401     68.91%
  69,847 requests in 30.00s
Requests/sec: 2,328.23
Transfer/sec: 1.42MB

Middleware overhead measured via A/B comparison: 0.7-0.9ms per request with rate limiting on, vs without. Negligible.

Observability

Every 429 response logs to Sentry with the user ID, plan, and overage amount. We alert if any single user hits 429 more than 10 times in an hour — either they have a broken client loop, or they need to be nudged to upgrade.

Sentry.capture_message("Rate limit exceeded", extra: {
  user_id:, plan:, limit:, current:
})

Operationally we run a daily Sidekiq job that summarizes 429 hits per plan and emails the ops team if overages spike.

Testing the Middleware

RSpec with a real Redis instance (not a mock) gives you the highest confidence:

RSpec.describe RateLimitMiddleware do
  let(:app) { ->(_env) { [200, {}, ["ok"]] } }
  let(:middleware) { described_class.new(app) }
  let(:user) { create(:user, :with_free_plan) }
  let(:api_key) { create(:api_key, user:) }

  it "blocks requests over the daily limit" do
    1_000.times do
      status, _, _ = middleware.call(env_with_token(api_key.token))
      expect(status).to eq(200)
    end
    status, _, _ = middleware.call(env_with_token(api_key.token))
    expect(status).to eq(429)
  end
end

Mocking Redis here would defeat the purpose — you want to verify the real INCR semantics.

Practical Takeaways

Use Redis INCR with EXPIRE. Simple, atomic, battle-tested.
Scope keys by date for daily windows without cron jobs.
Set TTL on first increment only, not every time.
Cache plan lookups to avoid hitting Postgres on every request.
Fail-open on Redis errors. Uptime usually matters more than enforcement.
Return useful 429 responses with X-RateLimit-* and Retry-After headers.

This pattern scales comfortably to hundreds of thousands of daily requests per instance. If you are building a trading API (or any SaaS API), this is a solid starting point. Check out the APIndicators rate-limit headers in action at apindicators.com/docs.