Rate limiting is one of those features that looks trivial until you ship it. Then you discover race conditions under load, counters that drift, Redis memory that bloats, and edge cases around plan upgrades mid-window. This post walks through the exact rate limiting pattern powering APIndicators' API: Rails middleware backed by Redis, per-plan daily quotas, atomic counters, and a TTL strategy that keeps Redis lean.
We serve 100,000+ requests per day across our user base with this setup. Redis memory for rate limiting sits under 5 MB. Middleware overhead is under 1ms per request.
Requirements
What a production rate limiter needs to handle:
- Atomic increments under concurrent requests (no double-counting or lost updates)
- Per-user, per-plan quotas that can differ by subscription tier
- Daily reset on UTC boundaries without cron jobs
- Graceful degradation if Redis is unreachable
- Clear 429 responses with retry-after headers
The Core Pattern: Rails Middleware + Redis INCR
The whole implementation fits in about 60 lines of Ruby. Here is the middleware:
class RateLimitMiddleware
def initialize(app)
@app = app
end
def call(env)
request = ActionDispatch::Request.new(env)
user_id = extract_user_id(request)
return @app.call(env) unless user_id
plan = fetch_plan(user_id)
limit = DAILY_LIMITS[plan]
key = "rate_limit:#{user_id}:#{Date.current}"
current = Rails.cache.redis.with do |redis|
count = redis.incr(key)
redis.expire(key, 86_400) if count == 1
count
end
if current > limit
return rate_limited_response(limit, current)
end
status, headers, body = @app.call(env)
headers["X-RateLimit-Limit"] = limit.to_s
headers["X-RateLimit-Remaining"] = (limit - current).to_s
[status, headers, body]
end
private
DAILY_LIMITS = {
"free" => 1_000,
"starter" => 10_000,
"pro" => 100_000,
"enterprise" => 1_000_000
}.freeze
def extract_user_id(request)
token = request.headers["Authorization"]&.sub("Bearer ", "")
ApiKey.find_by(token:)&.user_id
end
def fetch_plan(user_id)
Rails.cache.fetch("user_plan:#{user_id}", expires_in: 5.minutes) do
User.find(user_id).subscription&.plan || "free"
end
end
def rate_limited_response(limit, current)
headers = {
"Content-Type" => "application/json",
"X-RateLimit-Limit" => limit.to_s,
"X-RateLimit-Remaining" => "0",
"Retry-After" => seconds_until_midnight_utc.to_s
}
body = { error: "rate_limit_exceeded", limit:, current: }.to_json
[429, headers, [body]]
end
def seconds_until_midnight_utc
(Date.current.next_day.to_time.utc - Time.now.utc).to_i
end
end
Register it in config/application.rb:
config.middleware.insert_before Rack::Head, RateLimitMiddleware
Why This Pattern Works
INCR is atomic. Redis' INCR is a single atomic operation. Two concurrent requests hitting it will always produce the right count. No need for locks, no race conditions.
TTL set only on first increment. The redis.expire(key, 86_400) if count == 1 pattern sets the 24-hour TTL exactly once per key. Subsequent increments do not reset the TTL, so the window rolls forward cleanly.
Keys are date-scoped. Using Date.current in the key means every new UTC day gets a fresh key. The old key dies via TTL. No cron job required.
Plan lookups are cached. We hit the database at most once per 5 minutes per user for plan info. Rate limiting happens entirely in Redis for active users.
Failure Modes and How We Handle Them
Redis is down. If Rails.cache.redis.with raises, we currently let it through (fail-open). Alternatives: fail-closed (reject all requests) or fall back to a local in-memory counter (risk drift across app instances). Fail-open is the right tradeoff for us; rate limiting matters less than uptime.
def call(env)
rate_check(env)
rescue Redis::BaseError => e
Sentry.capture_exception(e)
@app.call(env)
end
User upgrades mid-window. The plan cache expires in 5 minutes, so upgrades propagate quickly. The existing counter is not reset — the user just gets the higher limit for the rest of the day.
Hot keys. Pro users hitting 100k/day create hot keys. Redis Cluster sharding can help if you scale further, but 100k INCRs per day is well within single-node Redis capacity (Redis handles 1M+ ops/sec on modest hardware).
Benchmarks
We benchmarked the middleware with wrk against a local Rails + Redis setup:
Running 30s test @ http://localhost:3000/v1/pairs/BTCUSDT/indicators
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 42.15ms 18.23ms 187ms 77.32%
Req/Sec 292.18 47.52 401 68.91%
69,847 requests in 30.00s
Requests/sec: 2,328.23
Transfer/sec: 1.42MB
Middleware overhead measured via A/B comparison: 0.7-0.9ms per request with rate limiting on, vs without. Negligible.
Observability
Every 429 response logs to Sentry with the user ID, plan, and overage amount. We alert if any single user hits 429 more than 10 times in an hour — either they have a broken client loop, or they need to be nudged to upgrade.
Sentry.capture_message("Rate limit exceeded", extra: {
user_id:, plan:, limit:, current:
})
Operationally we run a daily Sidekiq job that summarizes 429 hits per plan and emails the ops team if overages spike.
Testing the Middleware
RSpec with a real Redis instance (not a mock) gives you the highest confidence:
RSpec.describe RateLimitMiddleware do
let(:app) { ->(_env) { [200, {}, ["ok"]] } }
let(:middleware) { described_class.new(app) }
let(:user) { create(:user, :with_free_plan) }
let(:api_key) { create(:api_key, user:) }
it "blocks requests over the daily limit" do
1_000.times do
status, _, _ = middleware.call(env_with_token(api_key.token))
expect(status).to eq(200)
end
status, _, _ = middleware.call(env_with_token(api_key.token))
expect(status).to eq(429)
end
end
Mocking Redis here would defeat the purpose — you want to verify the real INCR semantics.
Practical Takeaways
- Use Redis
INCRwithEXPIRE. Simple, atomic, battle-tested. - Scope keys by date for daily windows without cron jobs.
- Set TTL on first increment only, not every time.
- Cache plan lookups to avoid hitting Postgres on every request.
- Fail-open on Redis errors. Uptime usually matters more than enforcement.
- Return useful 429 responses with
X-RateLimit-*andRetry-Afterheaders.
This pattern scales comfortably to hundreds of thousands of daily requests per instance. If you are building a trading API (or any SaaS API), this is a solid starting point. Check out the APIndicators rate-limit headers in action at apindicators.com/docs.