Rails API Performance: Eliminating N+1 Queries and Strategic Caching

If you have ever watched a Rails API endpoint go from 50ms to 3 seconds as your dataset grows, you have almost certainly been bitten by an N+1 query. It is the single most common performance problem in Rails applications, and it gets worse in proportion to your data. An endpoint that returns 10 records with 3 associations makes 31 database queries instead of 4. Scale that to 100 records and you are making 301 queries per request.

The good news is that Rails provides excellent tools for fixing this. The bad news is that most developers only learn about them after their production database starts melting. This article covers the patterns that keep API response times fast: detecting and eliminating N+1 queries, applying multi-layer caching, and making your database do less work per request.

These are not theoretical exercises. At APIndicators, our API serves trading indicator predictions across hundreds of cryptocurrency pairs with sub-100ms response times. Every optimization described here comes from real production experience.

Understanding N+1 Queries

An N+1 query happens when your code loads a collection of records (1 query) and then accesses an association on each record individually (N queries). Here is a typical example in a trading platform context:

class Api::V1::OrdersController < ApplicationController
  def index
    orders = current_user.orders.limit(50)
    render json: orders.map { |order|
      {
        id: order.id,
        pair: order.pair.symbol,
        strategy: order.strategy.name,
        signals: order.signals.count
      }
    }
  end
end

This looks innocent, but it generates:

1 query for orders
50 queries for order.pair
50 queries for order.strategy
50 queries for order.signals.count

That is 151 queries for a single API call. Each query has overhead: network round-trip to the database, query parsing, execution planning, and result serialization. Even if each query takes 1ms, you are spending 150ms just on database overhead.

Fixing N+1 with Eager Loading

Rails provides three methods for eager loading: includes, preload, and eager_load. Each has different behavior and use cases.

includes

The most common fix. Rails decides whether to use a separate query or a JOIN:

orders = current_user.orders
  .includes(:pair, :strategy, :signals)
  .limit(50)

This reduces 151 queries to 4 (one per table). Rails loads all associated records in batch queries using WHERE id IN (...).

preload

Forces separate queries. Use this when you want predictable query behavior or when a JOIN would create a cartesian product:

orders = current_user.orders
  .preload(:pair, :strategy)
  .limit(50)

eager_load

Forces a LEFT OUTER JOIN. Use this when you need to filter or sort by association attributes:

orders = current_user.orders
  .eager_load(:pair)
  .where(pairs: { quote_asset: "USDT" })
  .order("pairs.symbol ASC")
  .limit(50)

Nested Eager Loading

For deeply nested associations, use hash syntax:

orders = current_user.orders
  .includes(pair: :market_type, strategy: :indicators)
  .limit(50)

Detecting N+1 Queries Automatically

Relying on manual code review to catch N+1 queries is unreliable. Use the bullet gem to detect them automatically during development and testing.

group :development, :test do
  gem "bullet"
end

Configure it to raise errors in test and log warnings in development:

Bullet.enable = true
Bullet.raise = Rails.env.test?
Bullet.bullet_logger = true
Bullet.add_footer = true

With this setup, your test suite will fail if any N+1 query is introduced. This is far more reliable than code review because it catches N+1 queries that only appear with specific data patterns.

For production monitoring, prosopite is an alternative that is lighter weight and designed for production use:

gem "prosopite"

Prosopite.raise = false
Prosopite.custom_logger = Rails.logger
Prosopite.min_n_queries = 3

Strategic Caching for API Responses

After fixing N+1 queries, the next performance lever is caching. Rails supports multiple caching layers, and using the right one for each situation is critical.

Fragment Caching with Russian Doll Strategy

For JSON API responses, cache at the serializer level. When a record changes, only its cache entry is invalidated:

class OrderSerializer
  def self.cache_key(order)
    "order/#{order.id}/#{order.updated_at.to_i}"
  end

  def self.serialize(order)
    Rails.cache.fetch(cache_key(order), expires_in: 5.minutes) do
      {
        id: order.id,
        pair: order.pair.symbol,
        side: order.side,
        entry_price: order.entry_price.to_f,
        current_pnl: order.current_pnl.to_f
      }
    end
  end
end

Collection Caching

For list endpoints, cache the entire collection keyed on the most recently updated record:

class Api::V1::PairsController < ApplicationController
  def index
    pairs = Pair.active.order(:rank)

    cache_key = [
      "pairs/active",
      pairs.maximum(:updated_at).to_i,
      pairs.count
    ].join("/")

    result = Rails.cache.fetch(cache_key, expires_in: 1.minute) do
      pairs.map { |pair|
        {
          id: pair.id,
          symbol: pair.symbol,
          rank: pair.rank,
          volume_24h: pair.volume_24h.to_f
        }
      }
    end

    render json: result
  end
end

This pattern works well for data that changes infrequently relative to read frequency. Pairs data might update every hour, but get read thousands of times per minute.

Counter Caches for Aggregations

If your API returns counts (number of signals, number of trades), avoid COUNT(*) queries by using counter caches:

class Signal < ApplicationRecord
  belongs_to :pair, counter_cache: true
end

add_column :pairs, :signals_count, :integer, default: 0, null: false

Now pair.signals_count reads a column instead of executing a query. For high-traffic endpoints this eliminates thousands of aggregate queries per minute.

HTTP Caching Headers

Do not forget about client-side caching. Setting proper HTTP headers can eliminate requests entirely:

class Api::V1::IndicatorsController < ApplicationController
  def show
    indicator = Indicator.find(params[:id])

    if stale?(indicator, public: true)
      render json: IndicatorSerializer.serialize(indicator)
    end
  end
end

The stale? method sets ETag and Last-Modified headers. Subsequent requests with matching If-None-Match or If-Modified-Since headers get a 304 Not Modified response with no body, saving bandwidth and serialization time.

For trading data with known update intervals, use explicit cache control:

expires_in 30.seconds, public: true, stale_while_revalidate: 10.seconds

Database-Level Optimizations

Caching and eager loading operate at the application level. For the queries that do hit the database, these patterns reduce execution time.

Covering Indexes

If an endpoint filters and sorts by the same columns, a composite index lets the database answer the query entirely from the index without touching the table:

add_index :bot_orders, [:user_id, :created_at, :side],
          name: "idx_bot_orders_user_created_side"

This serves a query like WHERE user_id = ? ORDER BY created_at DESC without a table scan. The database reads the index in reverse order and returns results directly.

Partial Indexes

If you only query a subset of rows, a partial index is smaller and faster:

add_index :bot_orders, [:pair_id, :created_at],
          where: "status = 'open'",
          name: "idx_bot_orders_open_by_pair"

Open orders are typically less than 1% of all orders. A partial index on just open orders is 100x smaller than a full index, fits entirely in memory, and scans faster.

select() to Reduce Memory

If you only need three columns from a 30-column table, tell ActiveRecord:

pairs = Pair.active
  .select(:id, :symbol, :rank)
  .order(:rank)

This reduces memory allocation, speeds up object instantiation, and transfers less data from the database. For large result sets the difference is dramatic.

find_each for Background Processing

Never load large collections into memory at once. Use find_each for batch processing:

Pair.active.find_each(batch_size: 100) do |pair|
  PairAnalysisJob.perform_later(pair.id)
end

This loads 100 records at a time instead of all records at once, keeping memory usage constant regardless of collection size.

Measuring Performance

Optimization without measurement is guessing. Use these tools to understand where time is actually spent.

ActiveSupport::Notifications

Subscribe to SQL events to log slow queries:

ActiveSupport::Notifications.subscribe("sql.active_record") do |_name, start, finish, _id, payload|
  duration = (finish - start) * 1000
  if duration > 100
    Rails.logger.warn("[SLOW QUERY] #{duration.round(1)}ms: #{payload[:sql]}")
  end
end

rack-mini-profiler

For development, rack-mini-profiler shows query count and timing directly in your browser or API response headers:

gem "rack-mini-profiler"

It adds a X-MiniProfiler-Ids header to API responses that you can use to retrieve detailed timing breakdowns.

EXPLAIN ANALYZE

For specific slow queries, use PostgreSQL's EXPLAIN ANALYZE through ActiveRecord:

Pair.active.where(quote_asset: "USDT").order(:rank).explain(:analyze)

This shows the actual execution plan, including which indexes were used, how many rows were scanned, and where time was spent.

A Real-World Optimization Checklist

When a Rails API endpoint is slow, work through this checklist in order:

Check query count first. If you see more than 5-10 queries for a single endpoint, you likely have N+1 issues. Add includes calls.
Add database indexes. Run EXPLAIN ANALYZE on your slowest queries and add indexes for any sequential scans on large tables.
Use select() to limit columns. Especially important for tables with text or jsonb columns that you do not need in the response.
Add caching. Start with HTTP caching headers, then add application-level caching for expensive computations.
Use counter caches. Replace COUNT(*) queries with cached counters for any count that appears in list endpoints.
Consider materialized views. For complex aggregations that power dashboard endpoints, a materialized view refreshed on a schedule can be orders of magnitude faster than computing the aggregation on every request.

Conclusion

Rails API performance is not about exotic techniques or rewriting critical paths in C. It is about eliminating waste: unnecessary queries, redundant computations, and data you fetch but never use. The three pillars are eager loading (fix N+1), caching (avoid redundant work), and database optimization (make necessary work faster).

Start with the bullet gem to catch N+1 queries automatically. Add strategic caching for endpoints with high read-to-write ratios. Use partial indexes and select() to reduce database work. These changes are incremental, low-risk, and often deliver 10x improvements in response time.

The difference between a 50ms API response and a 500ms one is the difference between an app that feels instant and one that feels sluggish. For trading applications where milliseconds translate to real money, that difference matters even more.