Association collection find vs ordinary array find

jrochkind · July 18, 2007, 11:24pm

So on an ordinary Array, #find filters through the elements (in memory,
naturally), for what you’re looking for.

But ActiveRecord to-many associations redefine #find to behave like the
ordinary ActiveRecord find, going to the db, but limited to the objects
in that association (ie, probably the same objects that were in memory
in the array—unless the db has changed since cache!).

There are a couple reasons why I might prefer to filter in memory
instead:

Maybe the trip to the db is more expensive than filtering in memory.
Is it assumed that this will never be the case?
If I find an object through the existing AR find (by going to the
db), and I make changes to it, and save!—my cached version in
ar_obj.some_has_many is still unchanged, even though it was changed in
the db, becuase it’s a different object in memory. To fix this, I’ve
got to call ar_obj.some_has_many(true) to reload the cache from the db.
Now I’ve taken TWO extra trips to the db, compared to the other option:
just filtering the cached ar_obj.some_has_many array in memory, finding
the already loaded destination object that meets my conditions in there,
modifying it, and calling save!. I skipped the initial select to the db
for the ‘find’, and I skipped the reloading of the cached has_many in
case I want the ar_obj to have the updated value in it’s cached array.

But there’s no pretty way to do that filtering in memory. I guess I can
use ‘collect’ instead, and then call compact! on the subsequent array to
get rid of all the nils that represented things I wanted to filter out.
Hmm, I guess that is a pretty decent way after all. I just thought of
that.

But are other Rails developers running into this sort of thing, and how
do you deal with it? It surprised me to see the normal array #find
completely unavailable, since it seemed like it would still be useful in
an AR has_many, for the reasons outlined above.

Jonathan

jrochkind · July 18, 2007, 11:44pm

On 7/18/07, Jonathan R. [email protected] wrote:

But are other Rails developers running into this sort of thing, and how
do you deal with it? It surprised me to see the normal array #find
completely unavailable, since it seemed like it would still be useful in
an AR has_many, for the reasons outlined above.

Array#find is aliased as #detect, and #find_all is aliased as #select.

Pat

jrochkind · July 19, 2007, 1:44am

On Jul 18, 2007, at 16:24 , Jonathan R. wrote:

Maybe the trip to the db is more expensive than filtering in
memory.
Is it assumed that this will never be the case?

Most likely, this is the case. The database server is specialized
to be able to do these kinds of operations, so the database server is
probably going to be able to do the restriction faster (and in
compiled code, no less), than Ruby. Also, you have the overhead of
sending all of the data – including that which you’re just going to
throw away–over the wire to the middleware. Taking just those two
into account, I think it’s a safe bet that using the database server
to handle the filtering is going to be a win.

But there’s no pretty way to do that filtering in memory.

I haven’t looked at the ActiveRecord source, but I’d think it’s
likely that Array.find has been aliased rather than just left
hanging. If you’re motivated to do so, you might be able to work
something to your liking using the alias.

Michael G.
grzm seespotcode net

jrochkind · July 19, 2007, 3:22am

On Jul 18, 7:44 pm, Michael G. [email protected] wrote:

On Jul 18, 2007, at 16:24 , Jonathan R. wrote:

Maybe the trip to the db is more expensive than filtering in
memory.
Is it assumed that this will never be the case?

Most likely, this is the case. The database server is specialized
to be able to do these kinds of operations, so the database server is
probably going to be able to do the restriction faster (and in
compiled code, no less), than Ruby.

For simple fields, yes.
But a lot of times the filter is based on object methods, algorithms,
Ruby code, and you need to do it in Ruby. (Otherwise, we’d ditch
Rails, and just put the entire app/models in db stored procedures)

But there’s no pretty way to do that filtering in memory.

Array#detect

Yes, I do think that AR snatching the #find method can be very
confusing (caused some wierdness till I figured out what happened)

jrochkind · July 19, 2007, 5:41am

On 7/19/07, Pat M. [email protected] wrote:

On 7/18/07, Jonathan R. [email protected] wrote:

But are other Rails developers running into this sort of thing, and how
do you deal with it? It surprised me to see the normal array #find
completely unavailable, since it seemed like it would still be useful in
an AR has_many, for the reasons outlined above.

Array#find is aliased as #detect, and #find_all is aliased as #select.

Just to offer another alternative, you can also #to_a it first:

user.thingies.to_a.find{|thingy| … }

jrochkind · July 19, 2007, 1:26pm

On Jul 18, 11:40 pm, George [email protected] wrote:

Just to offer another alternative, you can also #to_a it first:

user.thingies.to_a.find{|thingy| … }

I think that will have the disadvantage of loading all of the objects
from the database, even if you end up stopping before you hit the
end. (Or perhaps this will actually be faster, due to eager loading?
Anyone know? Anyone benchmark?)

jrochkind · July 19, 2007, 6:41pm

Robert J. wrote:

On Jul 18, 11:40 pm, George [email protected] wrote:

Just to offer another alternative, you can also #to_a it first:

user.thingies.to_a.find{|thingy| … }

I think that will have the disadvantage of loading all of the objects
from the database, even if you end up stopping before you hit the
end. (Or perhaps this will actually be faster, due to eager loading?
Anyone know? Anyone benchmark?)

Will, this will load the whole relationship into the database first
(unless it was already loaded–either eagerly or lazily–previously in
the request), yes! But that was in fact what I wanted to do.

To answer the person who suggested that a db query is always faster than
an in-memory filter, I’m not sure. Here’s a situation:

There are tens of thousands (or more) of rows in the db, but only
dozens (at most) in the particular association. This is a fairly typical
situation–imagine looking at objects that belong to a particular user
account (the logged in user, often) in a system with thousands of user
accounts. There’s really no memory issue with loading dozens of objects
into memory.
In a given request, I need to slice and dice these ‘thingies’ perhaps
dozens of times. This is also something I run into with some
regularity. Occasionally, to make matters worse, the attribuets I need
to slice/dice on are not indexed efficiently in the db (perhaps they are
‘text’ type).

My intuition in this case is that loading them into memory once and
filtering in memory is going to be cheaper than dozens of db trips. But
I could be wrong, it should be tested I guess.

In many cases the entire association had to be loaded into memory
anyway for some other operation that acted on the entire association.

The final straw though, is that sometimes I need to spend the entire
request operating on a given ‘snapshot’ of the db. There might be other
things going on that would add objects to that association, but I don’t
want to know about them (until next request)–I need to spend this
request operating on the same snapshot. Loading the association into
memory and acting on the in-memory objects is one very easy way to do
that.

And, another case, occasionally I need to calcuate based on things that
aren’t simple SQL, and for complicated logic I’d rather do it in Ruby
too. (Isn’t that the Rails way?).

So anyway, this is definitely sometimes neccesary. But thanks to all who
gave me several ways to do it. #to_a, #detect and #select, etc. I still
think it was a bit odd for AR Assocations to hijack #find and #find_all,
would have preferred using new methods names, but oh well, what can you
do.

Jonathan