A Visual Guide to Using :includes in Rails
If you’re new to developing Rails applications, you’ve probably come across the term N + 1 queries. You probably also know that this is something you want to avoid.
As an engineer who joined Gusto straight out of undergrad with no Rails or full-time industry experience, I was initially overwhelmed by this concept. But, when broken down into bite size chunks, the solution to fixing N + 1 queries is actually quite simple.
N + 1 queries occur when a group of records are loaded from the database in an inefficient way, along with any records associated with them. The examples below dive into how we can solve this issue with :includes
and will help demystify how this method works under the hood. Note that the code snippets are using Ruby 2.3.4 with Rails 4.2.11.
Outlining the Problem
Let’s say we have a model called Employee
who has many Forms
.

# == Schema Information
# Table name: employees
# id
# nameclass Employee < ApplicationRecord
has_many: :forms
end# == Schema Information
# Table name: forms
# id
# employee_id
# kindclass Form < ApplicationRecord
belongs_to: :employee
end
And we have 5 total employee records and forms that we want to load because we want to do some sort of mutation.

Employee.all.map { |employee| employee.forms }.flatten
The SQL for this command looks like:
> SELECT `employees`.* FROM `employees` ORDER BY `employees`.`id`
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 1
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 2
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 3
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 4
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 5
There are 6 total hits to the database being made since we load employees in the first query and then make 5 additional queries to grab each employee’s forms. In other words, N + 1 SQL selects occur where N = 5.

Making queries ⚡️ with :includes
Rails provides an ActiveRecord method called :includes
which loads associated records in advance and limits the number of SQL queries made to the database. This technique is known as "eager loading" and in many cases will improve performance by a significant amount.
Depending on what your query is, :includes
will use either the ActiveRecord method :preload
or :eager_load
.
When does :includes use :preload?
In most cases :includes
will default to use the method :preload
which will fire 2 queries:
- Load all records tied to the leading model
- Load records associated with the leading model based off the foreign key on the associated model or the leading model
So, if we introduced :preload
to our query we would produce only 2 SQL selects where forms would be loaded, based off the foreign key Form#employee_id
.
Employee.preload(:forms).map { |employee| employee.forms }.flatten> SELECT `employees`.* FROM `employees`
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` IN (1, 2, 3, 4, 5)

The SQL for this example would look the exact same if we were to replace :preload
with :includes
.
Employee.includes(:forms).map { |employee| employee.forms }.flatten> SELECT `employees`.* FROM `employees`
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` IN (1, 2, 3, 4, 5)
When does :includes use :eager_load?
:includes
will default to use :preload
unless you reference the association being loaded in a subsequent clause, such as :where
or :order
. When constructing a query this way, you also need to explicitly reference the eager loaded model.
Employee.includes(:forms).where('forms.kind = "health"').references(:forms)
In this case, :includes
will use the method :eager_load
which will produce 1 query that uses left outer joins to build an intermediary table which is then used to construct the model output.
> SELECT `employees`.`id` AS t0_r0, `employees`.`name` AS t0_r1, `forms`.`id` AS t1_r0, `forms`.`employee_id` AS t1_r1, `forms`.`kind` AS t1_r2 LEFT OUTER JOIN `forms` ON `forms`.`employee_id` = `employees`.`id` WHERE (forms.kind = "health")

The SQL for this example would look the same if we were to replace :eager_load
with :includes
. We can remove :references
in this case too.
Employee.eager_load(:forms).where('forms.kind = "health"')> SELECT `employees`.`id` AS t0_r0, `employees`.`name` AS t0_r1, `forms`.`id` AS t1_r0, `forms`.`employee_id` AS t1_r1, `forms`.`kind` AS t1_r2 LEFT OUTER JOIN `forms` ON `forms`.`employee_id` = `employees`.`id` WHERE (forms.kind = "health")
If you replaced :preload
with :includes
, however, the query would fail to execute.
Can I use :includes with Form as the leading model instead?
If the query were reversed, where forms were loaded first and we wanted to efficiently load the employee association, we can still use :includes
. The query will load employees based off the collection of Employee#ids
, which is referenced from Form#employee_id
.
Form.includes(:employee)> SELECT `forms`.* FROM `forms`
> SELECT `employees`.* FROM `employees` WHERE `employees`.`id` IN (1, 2, 3, 4, 5)

Can I eager load nested associations with :includes?
Yes. Here are a few examples of different scenarios you could use if the Form
model was modified to have additional associations such as has_one: :signer
and has_one: issuer
, which has_one: :address
.
Employee.includes(forms: :issuer)Employee.includes(forms: { issuer: :address })Employee.includes(forms: [{ issuer: :address }, :signer])
Benchmarking — is using :includes always faster?
How big is the performance gain by using :includes
in your code? And since :includes
invokes either :preload
or :eager_load
, which method ends up being faster?
I tested the performance of the same queries listed in the examples above in a local instance of Gusto’s database to see the difference. Each query was run 3 times with caching disabled and then averaged in the table below.
Employee.first(n).map { |employee| employee.forms }.flattenEmployee.preload(:forms).first(n).map { |employee| employee.forms }.flattenEmployee.eager_load(:forms).first(n).map { |employee| employee.forms }.flatten

The data shows that using :includes
will improve the performance significantly when a :preload
is invoked, but has the opposite effect for :eager_load
in most cases. :eager_load
constructs a complex query to gather information about the associations being loaded, thus it makes sense that this option would be slower.
Measuring query performance in your own app is the best way to ensure you are using the most efficient methods available for your use cases. As a general rule, though, getting in the habit of using :includes
will lead to performant code that creates the best experience for other developers in your codebase—and of course, customers interacting with your apps.
Thank you to Quan Nguyen, Tori Huang, Jim Liu, and the Gusto Engineering Blog team for editing and feedback on earlier drafts.