What to do with HUGE instance variables in Rails?


#1

I’m learning rails and I can succesfully use the following things in the
controller:

@var1 = Var.find :all
@var2 = Var2.find :all

Problem is that the DB has about 260,000 lines which considerably slows
everything down if I load everything in @var1.

Isn’t there a way to load those items progressively? I treat them
separately (e.g. no interactions between them) in the program so it
should be possible.

Thanks guys.

-T


#2

On Wednesday, May 10, 2006, at 8:51 PM, Tony McMahon wrote:

separately (e.g. no interactions between them) in the program so it
removed_email_address@domain.invalid
http://lists.rubyonrails.org/mailman/listinfo/rails

Why would you need to load all of your records at once?
Usually you can use the database to select just the relevant subset to
operate on.

_Kevin


#3

Aye, it’s call pagination, yo. :slight_smile:

On 10 May 2006 18:55:03 -0000, Kevin O. <


#4

Why would you need to load all of your records at once?
Usually you can use the database to select just the relevant subset to
operate on.

Because my program is some sort of an iterative loop that builds small
files for each of the items in @ var. So I’m loading everything in @var
and then do a for… each in the program.

It works perfectly with a small number of lines but not when I load the
huge production DB. Maybe I should do something else but I’m just
learning Rails and most of the examples load everything in a @var

-T


#5

Colin wrote the following on 10.05.2006 23:06 :

Do it once and cache the output?! Or do the thing you do before you
put the data in the database, and the next time, just show the result.

If you really need to modify 260k items for every view… Well… Except
for using in-db functions or maybe a custom db system (and even this
might be slow), I don’t see a way to this in a performant way.

You could use something like the following (I had to in another context
where I deleted huge numbers of rows from a table, PostgreSQL simply
barfed when being sent a DELETE FROM table WHERE id in <huge_list>).

Var.transaction {
count = Var.count
batchsize = 1000 # or whatever suits your case
offset = 0
while offset < count
vars = Var.find :all, :offset => offset, :limit => batchsize

offset += batchsize
end
}

Note that if entries can be added/deleted while you execute this, you’ll
most probably need the transaction (if you can’t afford to process the
same entry twice and/or miss entries)…

Lionel.


#6

Do it once and cache the output?! Or do the thing you do before you
put the data in the database, and the next time, just show the result.

If you really need to modify 260k items for every view… Well… Except
for using in-db functions or maybe a custom db system (and even this
might be slow), I don’t see a way to this in a performant way.


#7

Tony McMahon wrote:

separately (e.g. no interactions between them) in the program so it
should be possible.

Thanks guys.

-T

My first answer is: don’t do it.
Second: Never ever.
Third: are you serious?

:wink:

That said: if you need to process 260.000 objects, drop down into SQL,
write a stored procedure and retrieve the results.

This isn’t a Rails or Ruby problem. Your approach would be slow in any
language.

– stefan


Rails performance tuning: http://railsexpress.de/blog
Subscription: http://railsexpress.de/blog/xml/rss20/feed.xml


#8

Stefan K. wrote:

My first answer is: don’t do it.
Second: Never ever.
Third: are you serious?

:wink:

That said: if you need to process 260.000 objects, drop down into SQL,
write a stored procedure and retrieve the results.
I can think of a few cases where that’s just not going to work…
particularly where the processing involves some sort of meaningful
interaction with things outside the database.

This isn’t a Rails or Ruby problem. Your approach would be slow in any
language.
It should be slow, but manageable - not impossible. Isn’t this sort of
thing why we have cursors?

How would this look as syntax:

@var = Foo.find(:first)
while @var:
do_stuff_with @var
@var = Foo.next(@var)
end

And as a generic solution, I think something like this would do it:

class Foo < ActiveRecord::Base
def Foo.next(foo)
Foo.find(:first, :conditions => [‘id > ?’, foo.id])
end
end

Untested, but that’s the general idea…


#9

Lionel B. wrote:

give the find “:order => id” and it should work.
Oh yes - best have that for safety. Although I think I’m right in
saying that under AR’s semantics, id’s are guaranteed to be monotonic
increasing, so the :first with the condition should be enough. Belt and
braces can’t hurt, though.

Note that while the interface is elegant from a performance point of
view this is horrible.
Well, yes - it’s a time/space trade-off. I like your batching, though -
quite neat, and gives more control over the trade-off ratio.


#10

Alex Y. wrote the following on 11.05.2006 09:21 :

SQL, write a stored procedure and retrieve the results.

class Foo < ActiveRecord::Base
def Foo.next(foo)
Foo.find(:first, :conditions => [‘id > ?’, foo.id])
end
end

Untested, but that’s the general idea…

give the find “:order => id” and it should work.
Note that while the interface is elegant from a performance point of
view this is horrible.

I wonder if we shouldn’t use ruby blocks to make it more clean and
scalable.

What about (reusing the code I proposed earlier):

class Foo < ActiveRecord::Base
def apply(conditions = nil)
Foo.transaction {
count = Foo.count
batchsize = 1000 # or whatever suits your case
offset = 0
while offset < count
foos = Foo.find :all, :conditions => conditions,
:offset => offset, :limit => batchsize
for foo in foos
yield foo
end
offset += batchsize
end
}
end
end

Then all you have to do is
Foo.apply {|foo| } # process each
row
Foo.apply(‘attr = value’) {|foo| }

process a subset

You can then modify the apply method if you find a better way to get
your Foo instances (if ActiveRecord implements a nice Cursor interface
in the future for example). You won’t need to modify the Foo.apply
calls.

Lionel.


#11

Hi Tony,

Could you describe your expectations a little more? For example:

  1. Do you display this data?
  2. Why does it need to be fast?
  3. Is it best to spend overall more time (ex: loading and processing
    data 1
    by 1) but with less memory use, than overall less time (load all), but
    with
    no “progress status” and huge memeory use?

Maybe that could help us finding a trade-off solution :wink:

Regards,

Nicolas


#12

On 5/11/06, Alex Y. removed_email_address@domain.invalid wrote:

Lionel B. wrote:

give the find “:order => id” and it should work.
Oh yes - best have that for safety. Although I think I’m right in
saying that under AR’s semantics, id’s are guaranteed to be monotonic
increasing, so the :first with the condition should be enough. Belt and
braces can’t hurt, though.

ids are monotonic increasing, but if you select without ordering, you
can’t be sure to retreive the next id, only one with an id greater
than the current record. The :first condition is not enough, as the
ordering of rows without a specific ORDER BY clause is arbitrary (and
so the first element returned would not necessarily be the next
element). The only situation I can think of where this is likely to
work without :order is if you are only using SELECT and INSERT, and
not UPDATE or DELETE, and even then it is probably database dependent.

Jeremy