Advanced Queries with ActiveRecord’s From Method

For those of us that write database-backed Rails apps,
ActiveRecord is a
familiar beast. We know its various query building methods well: where,
order, limit, and even group. The most important SQL clause is
conspicuously missing from our repertoire: the from clause.

Chances are you can't think of many, if any, times that you've used
ActiveRecord's from method. This is to be expected. The from clause in
SQL specifies the target table of a query. In the context of ActiveRecord, the
target table is explicitly expressed by the model from which you query.

Consider the times you use one-liners such as Developer.first,
Developer.all, and Developer.count. With each of these, the developers
table as the target of the query is expressed by the use of the Developer
model.

So, where does the from method come into the picture?

The
docs
describe the

method as such.

Specifies table from which the records will be fetched.

Specifying the table of a query against an ActiveRecord model is going to feel
a bit redundant.

> Developer.from('developers').to_sql
=> "SELECT \"developers\".* FROM developers"

As you've probably noticed we get this same behavior with nothing more than the
all method.

> Developer.all.to_sql
=> "SELECT \"developers\".* FROM \"developers\""

So, again, where does the from method come into the picture?

Our from clauses aren't always going to look as simple as from developers.
We sometimes need the "table" from which we are fetching to be a collection of
computed values. The key insight here is that the result of a query is itself a
table. That means that we can query against the result of another query. This
is ostensibly how subqueries work.

The from method comes into play when we are building up a complex query
in SQL and need to transition back to the land of ActiveRecord and Rails.

Let's work with the
developer_scores.sql
query from the hr-til codebase.

select
 developers.id,
 username,
 posts,
 likes,
 round(likes::numeric / posts, 2) as avg_likes,
 round(log(2, (1022 * ((score - min(score) over ()) / ((max(score) over ()) - (min(score) over ()))) + 2)::numeric), 1) as hotness
from developers
join (
  select developer_id as id, sum(score) as score
  from hot_posts
  group by developer_id
) developer_scores using(id)
join (
  select
  developer_id as id,
  count(*) as posts,
  sum(likes) as likes
  from posts
  group by developer_id
) stats using(id)

This query is used to compute both the average number of likes and the
hotness score for each developer's posts. Because of the complexity of the
query, we are better off writing it in SQL rather than using ActiveRecord's
DSL. This query does a complex computation that results in a bunch of data. It
would be nice if we could further refine the results as needed with
ActiveRecord's DSL. And herein lies the challenge. How do we transition from
this SQL query to something compatible with ActiveRecord?

The from method — the topic of this post — is, of course, the answer. We
can treat the above query as a "table" that we can query against (i.e. a
subquery). This allows us to then use order, limit, group, etc. to
further refine the result set.

We can create a class (app/models/developer_ranker.rb) to see how this can work.

class DeveloperRanker
  RANKED_DEV_SQL = <<-SQL
      (select
       developers.id,
       username,
       posts,
       likes,
       round(likes::numeric / posts, 2) as avg_likes,
       round(log(2, (1022 * ((score - min(score) over ()) / ((max(score) over ()) - (min(score) over ()))) + 2)::numeric), 1) as hotness
      from developers
      join (
        select developer_id as id, sum(score) as score
        from hot_posts
        group by developer_id
      ) developer_scores using(id)
      join (
        select
        developer_id as id,
        count(*) as posts,
        sum(likes) as likes
        from posts
        group by developer_id
      ) stats using(id)
      ) as developers
    SQL

  def self.top_developers(count=1)
    Developer.from(RANKED_DEV_SQL)
      .order(hotness: :desc)
      .limit(count)
  end

  def self.bottom_developers(count=1)
    Developer.from(RANKED_DEV_SQL)
      .order(hotness: :asc)
      .limit(count)
  end
end

We treat the complex query as a subquery by wrapping it in (...) as
developers
assigning it as a string to a constant. This constant is then
available to any methods that want to build on this query without duplicating
all the core query logic.

As you can see in DeveloperRanker.top_developers, we hydrate an
ActiveRecord::Relation with the results of the query. We can then chain those
familiar ActiveRecord query methods on the end. In this case, we order the
results with the developers most on fire at the top and then limiting by the
given count. Note that because we are querying against the Developer
model, it is essential that the subquery's name matches the name that
ActiveRecord uses for that model's table.

We create DeveloperRanker.bottom_developers to perform a similar query with
very little duplication.

An additional benefit of this approach is that the Developer objects that are
being hydrated are decorated with the additional values returned from the
query. Not only do we know the id and username of each developer, we can
also access the avg_likes and hotness attributes. These hydrated
Developer objects can be passed to the view layer like any other decorated
object would be.

This technique is great for generating reports and stats that require lots of
computation or complex joins, but are ultimately tied to particular domain
concepts (e.g. a developer).

Ultimately, this frees you to write some SQL without completely cutting
yourself off from the world of ActiveRecord. Enjoy!


Cover photo by Samuel Zeller on Unsplash.com

Leave a Reply

Your email address will not be published. Required fields are marked *