Thoughts and Observations on Test Suite Performance
Lately, we’ve been paying a lot of attention to our test suite’s performance, or rather, lack thereof. With 1000+ tests, we’re running at about 160 seconds. Yikes.
Sure, I’ve heard a lot of horror stories about test suites taking 10+ minutes to run, but we’ve been noticing a measurable deterioration in our suite’s performance lately, especially when I look at our test suite’s performance in pieces, rather than as a whole.
A recent post from Martin Emde at Engine Yard reminded me that a key idea behind unit tests is that they should be really, really fast. A common metric thrown around is that if a test takes over 1/10th of a second to run, it’s too slow. Agreed.
A Quick Comparison
For this example, I’m going to compare our users controller spec to a rough equivalent in another app of mine. I’ve picked the users controller because it’s easily comprehendible, has few dependencies, and is a resource that exists in nearly every web app. Thus, it’s a ripe example for comparison.
In our app, which I’ll refer to a “App A”, we have 26 examples, which take about 4.5 seconds to run, and that’s without integrating the views. A little arithmetic for ya: that’s approximately .17 seconds per example. Not awful, but it sure feels slow.
Now, for comparison, I ran a users controller spec for another app, BtownLiving, which I’ll refer to as “App B”. Surprisingly, on this app, my 37 examples ran in about 1.2 seconds. At about .03 seconds per example, that’s more like it.
|App||# Examples||Total Run Time||Time/example|
|A||26||4.5 secs||.17 secs|
|B||37||1.2 secs||.03 secs|
To summarize, my test file with 37 examples takes about a quarter of the time to run than my other with fewer examples. So, what gives!?
Identifying the Bottlenecks
Like most performance issues in web application development, my first assumption is that we’re probably creating significantly more records in the test database in the slower app. Without comparing the two files in detail, I also know that in the slower app, we are stubbing AuthLogic to simulate a logged in user, which is a requirement for nearly every action in App A. So we’ve got three potential pain points:
- AuthLogic stubbing magic
- Associated records created in test database
- Our routes file in the slower suite is much, much larger
That last one kinda jumped out at me once I realized it: we have a giant routes file. Like 1220 routes, giant. On the other hand, App B (BtownLiving) has only 116 routes.
Even though it’s a pointless exercise, I decided to comment out all routes except those needed to run this test file. This brought the test run time down to about 3.4 seconds. Not exactly the major speed boost I’m looking for, though a roughly 25% speed boost is certainly nothing to scoff at.
On the other hand, it’s a completely pointless exercise, because it’s not as if I can rip all the routes out of our app (a girl can dream, though). So on that note, I suppose it is somewhat of a relief that there are still major performance gains to be had elsewhere. On to diagnose the database.
After comparing the two test files side-by-side, the main difference seems to be that in App A (the slower suite) we are injecting a user into the database on every request, as well as stubbing out the requisite
UserSession cruft to make AuthLogic happy and tell our controller filters that we’re logged in. In contrast, in App B, it appears I’ve done a pretty good job of stubbing out the calls to the database. Still, I want to take a look at the log output and see how often we’re actually hitting the database in a given test run in each app.
Whoa! More than 5 times as many Inserts and triple the number of Loads in App A. That’s some serious database action occurring in only 26 examples. Consider the following:
Yikes! The numbers speak for themselves.
After doing some rough data collection, I’m comfortable proclaiming that, as many who have come before me have concluded, the database is the bottleneck. I plan to refactor the database stuff in App A and report back here the performance results. Similarly, I would be interested to learn what kind of performance other people are experiencing with their
For reference, here’s a gist of the two test files compared.