Beware of ActiveRecord::Calculations.count

Today, whilst engaged in some performance debugging, I discovered what I would assume is, to most, an easily missed intricacy of ActiveRecord. I am referring to the difference between using count and size on an ActiveRecord collection.

Consider this example:

Whereas calling size on an ActiveRecord collection simply counts the elements in the ruby array, the count method actually generates a SQL query. No doubt, this behavior is intended, though it is a bit confusing, given that calling size and count on a Ruby array are nearly identical, at least when count is not passed anything.

The differences here may seem trivial, but when iterating over a large collection of ActiveRecord objects, simply substituting size for count can result in a noticeable performance boost.

Thoughts and Observations on Test Suite Performance

Lately, we’ve been paying a lot of attention to our test suite’s performance, or rather, lack thereof. With 1000+ tests, we’re running at about 160 seconds. Yikes.

Sure, I’ve heard a lot of horror stories about test suites taking 10+ minutes to run, but we’ve been noticing a measurable deterioration in our suite’s performance lately, especially when I look at our test suite’s performance in pieces, rather than as a whole.

A recent post from Martin Emde at Engine Yard reminded me that a key idea behind unit tests is that they should be really, really fast. A common metric thrown around is that if a test takes over 1/10th of a second to run, it’s too slow. Agreed.

A Quick Comparison

For this example, I’m going to compare our users controller spec to a rough equivalent in another app of mine. I’ve picked the users controller because it’s easily comprehendible, has few dependencies, and is a resource that exists in nearly every web app. Thus, it’s a ripe example for comparison.

In our app, which I’ll refer to a “App A”, we have 26 examples, which take about 4.5 seconds to run, and that’s without integrating the views. A little arithmetic for ya: that’s approximately .17 seconds per example. Not awful, but it sure feels slow.

Now, for comparison, I ran a users controller spec for another app, BtownLiving, which I’ll refer to as “App B”. Surprisingly, on this app, my 37 examples ran in about 1.2 seconds. At about .03 seconds per example, that’s more like it.

App # Examples Total Run Time Time/example
A 26 4.5 secs .17 secs
B 37 1.2 secs .03 secs

To summarize, my test file with 37 examples takes about a quarter of the time to run than my other with fewer examples. So, what gives!?

Identifying the Bottlenecks

Like most performance issues in web application development, my first assumption is that we’re probably creating significantly more records in the test database in the slower app. Without comparing the two files in detail, I also know that in the slower app, we are stubbing AuthLogic to simulate a logged in user, which is a requirement for nearly every action in App A. So we’ve got three potential pain points:

  1. AuthLogic stubbing magic
  2. Associated records created in test database
  3. Our routes file in the slower suite is much, much larger

Bloaty Routes

That last one kinda jumped out at me once I realized it: we have a giant routes file. Like 1220 routes, giant. On the other hand, App B (BtownLiving) has only 116 routes.

Even though it’s a pointless exercise, I decided to comment out all routes except those needed to run this test file. This brought the test run time down to about 3.4 seconds. Not exactly the major speed boost I’m looking for, though a roughly 25% speed boost is certainly nothing to scoff at.

On the other hand, it’s a completely pointless exercise, because it’s not as if I can rip all the routes out of our app (a girl can dream, though). So on that note, I suppose it is somewhat of a relief that there are still major performance gains to be had elsewhere. On to diagnose the database.

Database Overload

After comparing the two test files side-by-side, the main difference seems to be that in App A (the slower suite) we are injecting a user into the database on every request, as well as stubbing out the requisite UserSession cruft to make AuthLogic happy and tell our controller filters that we’re logged in. In contrast, in App B, it appears I’ve done a pretty good job of stubbing out the calls to the database. Still, I want to take a look at the log output and see how often we’re actually hitting the database in a given test run in each app.

App Inserts Updates Loads Destroys
A 198 48 405 15
B 35 42 132 7

Whoa! More than 5 times as many Inserts and triple the number of Loads in App A. That’s some serious database action occurring in only 26 examples. Consider the following:

App # Examples Inserts/example Loads/example
A 26 7.6 15.5
B 37 .94 3.5

Yikes! The numbers speak for themselves.

Conclusion

After doing some rough data collection, I’m comfortable proclaiming that, as many who have come before me have concluded, the database is the bottleneck. I plan to refactor the database stuff in App A and report back here the performance results. Similarly, I would be interested to learn what kind of performance other people are experiencing with their UsersController tests/specs.

For reference, here’s a gist of the two test files compared.