Ruby Forum Ruby on Rails > association collection find vs ordinary array find

Posted by Jonathan Rochkind (jrochkind)
on 18.07.2007 23:24
So on an ordinary Array, #find filters through the elements (in memory,
naturally), for what you're looking for.

But ActiveRecord to-many associations redefine #find to behave like the
ordinary ActiveRecord find, going to the db, but limited to the objects
in that association (ie, probably the same objects that were in memory
in the array---unless the db has changed since cache!).

There are a couple reasons why I might prefer to filter in memory
instead:

1) Maybe the trip to the db is more expensive than filtering in memory.
Is it assumed that this will never be the case?

2) If I find an object through the existing AR find (by going to the
db), and I make changes to it, and save!---my cached version in
ar_obj.some_has_many is still unchanged, even though it was changed in
the db, becuase it's a different object in memory.  To fix this, I've
got to call ar_obj.some_has_many(true) to reload the cache from the db.
Now I've taken TWO extra trips to the db, compared to the other option:
just filtering the cached ar_obj.some_has_many array in memory, finding
the already loaded destination object that meets my conditions in there,
modifying it, and calling save!.  I skipped the initial select to the db
for the 'find', and I skipped the reloading of the cached has_many in
case I want the ar_obj to have the updated value in it's cached array.

But there's no pretty way to do that filtering in memory. I guess I can
use 'collect' instead, and then call compact! on the subsequent array to
get rid of all the nils that represented things I wanted to filter out.
Hmm, I guess that is a pretty decent way after all. I just thought of
that.

But are other Rails developers running into this sort of thing, and how
do you deal with it?  It surprised me to see the normal array #find
completely unavailable, since it seemed like it would still be useful in
an AR has_many, for the reasons outlined above.

Jonathan
Posted by Pat Maddox (pergesu)
on 18.07.2007 23:44
(Received via mailing list)
On 7/18/07, Jonathan Rochkind <rails-mailing-list@andreas-s.net> wrote:
> But are other Rails developers running into this sort of thing, and how
> do you deal with it?  It surprised me to see the normal array #find
> completely unavailable, since it seemed like it would still be useful in
> an AR has_many, for the reasons outlined above.

Array#find is aliased as #detect, and #find_all is aliased as #select.

Pat
Posted by Michael Glaesemann (Guest)
on 19.07.2007 01:44
(Received via mailing list)
On Jul 18, 2007, at 16:24 , Jonathan Rochkind wrote:

> 1) Maybe the trip to the db is more expensive than filtering in  
> memory.
> Is it assumed that this will never be the case?

Most likely, this *is* the case. The database server is specialized
to be able to do these kinds of operations, so the database server is
probably going to be able to do the restriction faster (and in
compiled code, no less), than Ruby. Also, you have the overhead of
sending all of the data -- including that which you're just going to
throw away--over the wire to the middleware. Taking just those two
into account, I think it's a safe bet that using the database server
to handle the filtering is going to be a win.

> But there's no pretty way to do that filtering in memory.

I haven't looked at the ActiveRecord source, but I'd think it's
likely that Array.find has been aliased rather than just left
hanging. If you're motivated to do so, you might be able to work
something to your liking using the alias.

Michael Glaesemann
grzm seespotcode net
Posted by Robert James (robertjames)
on 19.07.2007 03:22
(Received via mailing list)
On Jul 18, 7:44 pm, Michael Glaesemann <g...@seespotcode.net> wrote:
> On Jul 18, 2007, at 16:24 , Jonathan Rochkind wrote:
>
> > 1) Maybe the trip to the db is more expensive than filtering in  
> > memory.
> > Is it assumed that this will never be the case?
>
> Most likely, this *is* the case. The database server is specialized  
> to be able to do these kinds of operations, so the database server is  
> probably going to be able to do the restriction faster (and in  
> compiled code, no less), than Ruby.

For simple fields, yes.
But a lot of times the filter is based on object methods, algorithms,
Ruby code, and you need to do it in Ruby.  (Otherwise, we'd ditch
Rails, and just put the entire app/models in db stored procedures)


> > But there's no pretty way to do that filtering in memory.

Array#detect

Yes, I do think that AR snatching the #find method can be very
confusing (caused some wierdness till I figured out what happened)
Posted by George (Guest)
on 19.07.2007 05:41
(Received via mailing list)
On 7/19/07, Pat Maddox <pergesu@gmail.com> wrote:
>
> On 7/18/07, Jonathan Rochkind <rails-mailing-list@andreas-s.net> wrote:
> > But are other Rails developers running into this sort of thing, and how
> > do you deal with it?  It surprised me to see the normal array #find
> > completely unavailable, since it seemed like it would still be useful in
> > an AR has_many, for the reasons outlined above.
>
> Array#find is aliased as #detect, and #find_all is aliased as #select.

Just to offer another alternative, you can also #to_a it first:

  user.thingies.to_a.find{|thingy| ... }
Posted by Robert James (robertjames)
on 19.07.2007 13:26
(Received via mailing list)
On Jul 18, 11:40 pm, George <george.og...@gmail.com> wrote:
> Just to offer another alternative, you can also #to_a it first:
>
>   user.thingies.to_a.find{|thingy| ... }

I think that will have the disadvantage of loading all of the objects
from the database, even if you end up stopping before you hit the
end.  (Or perhaps this will actually be faster, due to eager loading?
Anyone know? Anyone benchmark?)
Posted by Jonathan Rochkind (jrochkind)
on 19.07.2007 18:41
Robert James wrote:
> On Jul 18, 11:40 pm, George <george.og...@gmail.com> wrote:
>> Just to offer another alternative, you can also #to_a it first:
>>
>>   user.thingies.to_a.find{|thingy| ... }
> 
> I think that will have the disadvantage of loading all of the objects
> from the database, even if you end up stopping before you hit the
> end.  (Or perhaps this will actually be faster, due to eager loading?
> Anyone know? Anyone benchmark?)

Will, this will load the whole relationship into the database first 
(unless it was already loaded--either eagerly or lazily--previously in 
the request), yes! But that was in fact what I wanted to do.

To answer the person who suggested that a db query is always faster than 
an in-memory filter, I'm not sure. Here's a situation:

1. There are tens of thousands (or more) of rows in the db, but only 
dozens (at most) in the particular association. This is a fairly typical 
situation--imagine looking at objects that belong to a particular user 
account (the logged in user, often) in a system with thousands of user 
accounts.  There's really no memory issue with loading dozens of objects 
into memory.

2. In a given request, I need to slice and dice these 'thingies' perhaps 
dozens of times.   This is also something I run into with some 
regularity.  Occasionally, to make matters worse, the attribuets I need 
to slice/dice on are not indexed efficiently in the db (perhaps they are 
'text' type).

My intuition in this case is that loading them into memory _once_ and 
filtering in memory is going to be cheaper than dozens of db trips. But 
I could be wrong, it should be tested I guess.

In many cases the entire association had to be loaded into memory 
_anyway_ for some other operation that acted on the entire association.

The final straw though, is that sometimes I need to spend the entire 
request operating on a given 'snapshot' of the db. There might be other 
things going on that would add objects to that association, but I don't 
want to know about them (until next request)--I need to spend this 
request operating on the same snapshot. Loading the association into 
memory and acting on the in-memory objects is one very easy way to do 
that.

And, another case, occasionally I need to calcuate based on things that 
aren't simple SQL, and for complicated logic I'd rather do it in Ruby 
too. (Isn't that the Rails way?).

So anyway, this is definitely sometimes neccesary. But thanks to all who 
gave me several ways to do it. #to_a, #detect and #select, etc.  I still 
think it was a bit odd for AR Assocations to hijack #find and #find_all, 
would have preferred using new methods names, but oh well, what can you 
do.

Jonathan