Forum: Ruby on Rails What to do with HUGE instance variables in Rails?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Tony McMahon (Guest)
on 2006-05-10 22:51
I'm learning rails and I can succesfully use the following things in the
controller:

@var1 = Var.find :all
@var2 = Var2.find :all

Problem is that the DB has about 260,000 lines which considerably slows
everything down if I load everything in @var1.

Isn't there a way to load those items progressively? I treat them
separately (e.g. no interactions between them) in the program so it
should be possible.

Thanks guys.

-T
Kevin O. (Guest)
on 2006-05-10 22:57
(Received via mailing list)
On Wednesday, May 10, 2006, at 8:51 PM, Tony McMahon wrote:
>separately (e.g. no interactions between them) in the program so it
>removed_email_address@domain.invalid
>http://lists.rubyonrails.org/mailman/listinfo/rails

Why would you need to load all of your records at once?
Usually you can use the database to select just the relevant subset to
operate on.

_Kevin
Daniel W. (Guest)
on 2006-05-10 22:57
(Received via mailing list)
Aye, it's call pagination, yo. :)

On 10 May 2006 18:55:03 -0000, Kevin O. <
Tony McMahon (Guest)
on 2006-05-10 23:43
> Why would you need to load all of your records at once?
> Usually you can use the database to select just the relevant subset to
> operate on.

Because my program is some sort of an iterative loop that builds small
files for each of the items in @ var. So I'm loading everything in @var
and then do a for... each in the program.

It works perfectly with a small number of lines but not when I load the
huge production DB. Maybe I should do something else but I'm just
learning Rails and most of the examples load everything in a @var...

-T
Colin (Guest)
on 2006-05-11 01:06
Do it once and cache the output?! Or do the thing you do _before_ you
put the data in the database, and the next time, just show the result.

If you really need to modify 260k items for every view.. Well.. Except
for using in-db functions or maybe a custom db system (and even this
might be slow), I don't see a way to this in a performant way.
Lionel B. (Guest)
on 2006-05-11 02:51
(Received via mailing list)
Colin wrote the following on 10.05.2006 23:06 :
> Do it once and cache the output?! Or do the thing you do _before_ you
> put the data in the database, and the next time, just show the result.
>
> If you really need to modify 260k items for every view.. Well.. Except
> for using in-db functions or maybe a custom db system (and even this
> might be slow), I don't see a way to this in a performant way.
>
>

You could use something like the following (I had to in another context
where I deleted huge numbers of rows from a table, PostgreSQL simply
barfed when being sent a DELETE FROM table WHERE id in <huge_list>).

Var.transaction {
count = Var.count
batchsize = 1000 # or whatever suits your case
offset = 0
while offset < count
    vars = Var.find :all, :offset => offset, :limit => batchsize
    <do whatever you need to do with vars>
    offset += batchsize
end
}

Note that if entries can be added/deleted while you execute this, you'll
most probably need the transaction (if you can't afford to process the
same entry twice and/or miss entries)...

Lionel.
Stefan K. (Guest)
on 2006-05-11 08:51
(Received via mailing list)
Tony McMahon wrote:
> separately (e.g. no interactions between them) in the program so it
> should be possible.
>
> Thanks guys.
>
> -T
>
>
My first answer is: don't do it.
Second: Never ever.
Third: are you serious?

;-)

That said: if you need to process 260.000 objects, drop down into SQL,
write a stored procedure and retrieve the results.

This isn't a Rails or Ruby problem. Your approach would be slow in any
language.


-- stefan

--
Rails performance tuning: http://railsexpress.de/blog
Subscription: http://railsexpress.de/blog/xml/rss20/feed.xml
Alex Y. (Guest)
on 2006-05-11 11:23
(Received via mailing list)
Stefan K. wrote:
>>
> My first answer is: don't do it.
> Second: Never ever.
> Third: are you serious?
>
> ;-)
>
> That said: if you need to process 260.000 objects, drop down into SQL,
> write a stored procedure and retrieve the results.
I can think of a few cases where that's just not going to work...
particularly where the processing involves some sort of meaningful
interaction with things outside the database.

> This isn't a Rails or Ruby problem. Your approach would be slow in any
> language.
It should be slow, but manageable - not impossible.  Isn't this sort of
thing why we have cursors?

How would this look as syntax:

@var = Foo.find(:first)
while @var:
   do_stuff_with @var
   @var = Foo.next(@var)
end

And as a generic solution, I think something like this would do it:

class Foo < ActiveRecord::Base
   def Foo.next(foo)
     Foo.find(:first, :conditions => ['id > ?', foo.id])
   end
end

Untested, but that's the general idea...
Lionel B. (Guest)
on 2006-05-11 16:13
(Received via mailing list)
Alex Y. wrote the following on 11.05.2006 09:21 :

>>>
>>>
>> SQL, write a stored procedure and retrieve the results.
>
> class Foo < ActiveRecord::Base
>   def Foo.next(foo)
>     Foo.find(:first, :conditions => ['id > ?', foo.id])
>   end
> end
>
> Untested, but that's the general idea...
>

give the find  ":order => id" and it should work.
Note that while the interface is elegant from a performance point of
view this is horrible.

I wonder if we shouldn't use ruby blocks to make it more clean and
scalable.

What about (reusing the code I proposed earlier):

class Foo < ActiveRecord::Base
    def apply(conditions = nil)
        Foo.transaction {
            count = Foo.count
            batchsize = 1000 # or whatever suits your case
            offset = 0
            while offset < count
                foos = Foo.find :all, :conditions => conditions,
                             :offset => offset, :limit => batchsize
                for foo in foos
                    yield foo
                end
                offset += batchsize
            end
        }
    end
end

Then all you have to do is
Foo.apply {|foo| <do whatever you want to do with foo>} # process each
row
Foo.apply('attr = value') {|foo| <do whatever you want to do with foo>}
# process a subset

You can then modify the apply method if you find a better way to get
your Foo instances (if ActiveRecord implements a nice Cursor interface
in the future for example). You won't need to modify the Foo.apply
calls.

Lionel.
Alex Y. (Guest)
on 2006-05-11 16:43
(Received via mailing list)
Lionel B. wrote:
> give the find  ":order => id" and it should work.
Oh yes - best have that for safety.  Although I *think* I'm right in
saying that under AR's semantics, id's are guaranteed to be monotonic
increasing, so the :first with the condition should be enough.  Belt and
braces can't hurt, though.

> Note that while the interface is elegant from a performance point of
> view this is horrible.
Well, yes - it's a time/space trade-off.  I like your batching, though -
quite neat, and gives more control over the trade-off ratio.
Nicolas B. (Guest)
on 2006-05-11 17:02
(Received via mailing list)
Hi Tony,

Could you describe your expectations a little more? For example:
1) Do you display this data?
2) Why does it need to be fast?
3) Is it best to spend overall more time (ex: loading and processing
data 1
by 1) but with less memory use, than overall less time (load all), but
with
no "progress status" and huge memeory use?
4) ....

Maybe that could help us finding a trade-off solution ;-)

Regards,

Nicolas
Jeremy E. (Guest)
on 2006-05-11 19:54
(Received via mailing list)
On 5/11/06, Alex Y. <removed_email_address@domain.invalid> wrote:
> Lionel B. wrote:
> > give the find  ":order => id" and it should work.
> Oh yes - best have that for safety.  Although I *think* I'm right in
> saying that under AR's semantics, id's are guaranteed to be monotonic
> increasing, so the :first with the condition should be enough.  Belt and
> braces can't hurt, though.

ids are monotonic increasing, but if you select without ordering, you
can't be sure to retreive the next id, only one with an id greater
than the current record.  The :first condition is not enough, as the
ordering of rows without a specific ORDER BY clause is arbitrary (and
so the first element returned would not necessarily be the next
element).  The only situation I can think of where this is likely to
work without :order is if you are only using SELECT and INSERT, and
not UPDATE or DELETE, and even then it is probably database dependent.

Jeremy
This topic is locked and can not be replied to.