It has been about five weeks since we elevated our fist release of our new platform coded in Ruby on Rails. Looking back, my teammates and myself couldn’t be happier with making the switch over to Rails. For the most part, it has continued to amaze us with how quickly we can add new functionality. Additionally, the open source community has continued to provide valuable plugins for our application. Between the technology stack and the community, we have been able to accomplish a lot with little time.
However, like with every technology stack, not everything is perfect. Additionally, we learned as we went, so naturally we made mistakes. Now, as we get ready to complete development on our second major release for our product, we have begun to incorporate some lessons learned into both our design patterns and coding practices for our team. As we do this, I thought it would be nice to publish them here. This would allow other development shops to both learn from our mistakes, and provide additional feedback as to how we might solve them differently.
Since there are several areas for me to talk about within this topic (Active Record, RJS, Plugins, Models, etc), I am going to do this over several posts. This will allow me to go into detail on each specific area, rather then cram it all into one post.
That being said, I am going to first discuss what we have found out in regards to Active Record. This has been our biggest pain point, and we have found some great simple solutions, and utilized some amazing open source tools to help us improve performance.
Active Record, and ORM’s in general are amazing to code against. The simplicity they provide by encapsulating both the relationship, the mapping, and CRUD operations for the data types in your domain will make development easy. However, they also tend employ a load on demand, and load everything paradigm when interacting with the database. If you are not careful, these can have some serious performance implications in you application. For us, these were the two biggest offenders.
Load on Demand
The load on demand problem typically rears its ugly head when used in conjunction of an enumeration. This is typically referred to within the Rails community as the “N + 1″ problem. The code snippet below shows an example of how you could easily create it
<% @registered_users.each do |registered_user| %>
<%= registered_user.degrees.map(&:title).join(', ') %>
<% end %>
In this example, you will see that for each user in the system, we are making a call to retrieve the degrees they have associated. As the number of users in the systems grows, so will the number of requests to the database. Over time, this will begin to slow your page down to a crawl. Thankfully, Rails and Active Record provide a solution for this problem.
For requests like the one listed above, you need to also include the associated degrees when initially retrieving the users from the system. This way, a join will occur in your database to bring back all the necessary data in a single call, rather then multiple.
@registered_users = User.find(:all, :conditions => "registered = true", :include => [:degrees] %>
Keep in mind, that you need to monitor to actually see if the join generated by the include option to your models find method actually improves performance. Joins prior to ORM implicate performance, its’ no different with an ORM. So, keep in mind complicated nested includes might not actually have a performance benefit.
Retrieve everything, all the time
Active Record defaults to a retrieve everything paradigm until told otherwise. Again, like load on demand, this makes development a snap. A simple find statement retrieves you a fully hydrated object. However, this can lead to unnecessary data been retrieved from the database. We found that by leveraging the built in select statement within Active Record usually solves this problem
@registered_users = User.find(:first, :select => "first_name, last_name")
Using the select statement will retrieve only the data you specify. This will allow you to keep the data retrieval as light as possible. We have since moved to always specifying exactly what data you are retrieving through the select statement. This speeds up the retrieval and adds a little extra documentation to the find.
Performance Monitoring
Through the course of our development we have found the use of several tools to help monitor our applications performance. We typically do this two ways, the first being in our local development and test environments on a single page request basis. The second being aggregate data pulled from our production boxes.
We monitor the single page performance in the development and testing environments using FiveRuns Tuneup application. This plugin installs a div at the top our your page that upon mouse over will provide diagnostics information on page you are currently viewing. We use this as our first line of defense for monitoring performance.
Additionally, we use http://github.com/wvanbergen/rails-log-analyzer/wikis to analyze our log files. This provides aggregate data and will identify your top ten poorly performing pages. Both have proven invaluable.
Don’t forget the database
At the end of the day, if you don’t let Active Record lull you into forgetting that there is a database out there, you should do fine. We simply retrieve only what we need with selects and minimize database queries with includes and have seen a tremendous reduction in time spent at the database per page request.
It seems obvious, however, when we were juggling learning a new technology stack, and sitting behind the beautifully simple API’s provided by Active Record, it was easy to let these obvious mistakes enter our code base. Fortunately, they were easy to fix.
If you have any additional Active Record improvements, suggestions, or points of interest, please post as a comment.
No related posts.






Joel 6:25 am on August 26, 2008 Permalink
What format is the logfile you’re using the analyzer script with? I’ve just now installed that script and ran it against my stock development.log file but keep getting errors that look similar to:
ERROR: Output report timespan not found!
Don’t see any particular detail as to how the logfiles should be formatted in order for this to work either. Thanks Anthony
Joel 6:28 am on August 26, 2008 Permalink
aahhhhh you know what – nevermind. delete the previous comment as well as this one. The script assumes your CWD is the same as where the analyze.rb script lives … not very happy when you’re calling it absolutely from ~ or your rails root.