Bypassing ORM for performance, is it worth it?

dubstep · February 15, 2011, 9:28pm

The following method in my rails app is called very, very frequently on
incoming requests and results in model objects being rendered as JSON.
All is working fine, but this thing has to be as efficient and fast as
possible. So far I’ve added an index to updated_at and I’ve enabled
mod_deflate on the server so the JSON responses are compressed.

I’ve heard that it’s possible to optimise rails queries as raw SQL so to
eek extra performance out of the app. Is this true and a good idea (if
it’s marginal I’d stay as I am) and given the code in my method; is it
worth doing, I’m not so familiar with SQL syntax so if it is something
that I should do I’d appreciate any tips as to the syntax.

Here’s the method as it stands, working fine - circa 5,000 records,
growing I hope to 50,000 in a few weeks, maybe 100,000 records
ultimately.

def self.time_query(date_from,date_to,limit = 40)
self.all(:conditions => { :updated_at => date_from…date_to }, :order
=> “updated_at DESC”, :limit => limit)
end

Here’s some output from the logs, I guess the SQL I could use directly
is based on the
“SELECT * FROM apps WHERE (apps.updated_at BETWEEN
‘2011-02-15T18:25:53Z’ AND ‘2011-02-15T18:35:53Z’) ORDER BY updated_at
DESC, updated_at ASC LIMIT 40”

Is this worth doing, are there significant performance gains to be had
bypassing the AR layer.
If it matters this is a rails 2.3.8 app.

Processing AppsController#index to json (for 127.0.0.1 at 2011-02-15
18:35:53) [GET]
Parameters: {“date_to”=>“2011-02-15T18:35:53Z”,
“date_from”=>“2011-02-15T18:25:53Z”}
App Load (27.5ms) SELECT * FROM apps WHERE (apps.updated_at
BETWEEN ‘2011-02-15T18:25:53Z’ AND ‘2011-02-15T18:35:53Z’) ORDER BY
updated_at DESC, updated_at ASC LIMIT 40
Completed in 34ms (View: 1, DB: 28) | 200 OK
SQL (0.2ms) SET NAMES ‘utf8’
SQL (0.1ms) SET SQL_AUTO_IS_NULL=0

bingobob · February 15, 2011, 10:52pm

On Feb 15, 8:28pm, bingo bob [email protected] wrote:

that I should do I’d appreciate any tips as to the syntax.

Here’s the method as it stands, working fine - circa 5,000 records,
growing I hope to 50,000 in a few weeks, maybe 100,000 records
ultimately.

You might try connection.select_all (which will return an array of
hashes), the default to_json might be sufficient.
Also worth only selecting those columns you need in your output, and
(if applicable) create an index that includes those columns (known as
a covering index)

Fred

bingobob · February 16, 2011, 6:41am

I’ve done it before on a very specific case that we wanted to load a
very
big graph of objects and performance was the #1 priority. Using explicit
“select” (aka not loading all the data) and not instatiating AR objects
(through “connection.select_all”) does make a huge difference but it’s
only worth the noise in code when all other optimizations have taken
place
(indexes, no N+1 queries, etc) and the bottleneck is ruby and
ActiveRecord.
And in general don’t optimize pre-maturely : let your records grow to
the
numbers you expect and then make it faster

P.S.: Another thing you might want to check out is the
ActiveSupport#to_json
since that also was a big bottleneck (with all that magic it has) - if
what
you have in memory for serialization is a pure hash (and not an object
which
needs special handling) you can try the original ::JSON.generate instead
and
see if that makes any difference to you.

bingobob · February 16, 2011, 12:42pm

Thanks for yours, I hear your ideas.

I’m not hearing anything massive that I haven’t done or need to change.
I think I’ll stay where I am for now and see how I go - updated_at
column is indexed.

Thanks for the tips re the JSON, I had done a tweak on that in the model
already by overriding the default to_json method to only include the
attributes I need to serve.

I figure I’m in reasonable shape, I’m now looking at how to benchmark
and performance monitor. Got hold of the rawk.rb script yesterday which
looks helpful…

Anyone know anything that will monitor a production log live and report
requests/sec?