Massive data output

Hi.
I’ve got huge size of data to show to client browser.
It is actually text rows of database.
It’s about 200,000 rows and 10 columns.

When I tried it by save it as instance variables and then pass it to
rhtml,
it made server freeze since it is huge data and needs very big size of
memory.

I’m wordering if there is any other way to accomplish this.
I think FLUSHING data output depending on data size.

Any ideas are welcomed…
Thanks

On 27 Aug 2006, at 12:43, eskim wrote:

I’m wordering if there is any other way to accomplish this.
I think FLUSHING data output depending on data size.

Any ideas are welcomed…

I personally don’t think the server necessarely is the culprit.
Generally speaking, it’s not a good idea to render 200,000 rows in a
web app, that’s what’s pagination is for. You could look into
OpenRico’s LiveGrid, which will dynamically load rows (load the rows
visible on screen). While you’re at it, you could integrate it in a
plugin and release it for everyone to use :wink:

Best regards

Peter De Berdt

Note that gecko-based browsers will choke on this unless you break it up
into multiple tables. Gecko seems to take an exponentially long time to
render a table depending upon the number rows in that table. IE based
browsers do not have this problem. Nor do khtml based ones.

eskim wrote:

I’m wordering if there is any other way to accomplish this.
I think FLUSHING data output depending on data size.

I’ve had a similar problem when manipulating big sets of records
(+1000). It basically keeps all of the ActiveRecord objects around in
memory, and it just blows up.

Maybe some kind of scheme where you retrieve the records in batches (so
that you don’t have one big array referencing them all), and then force
garbage collection on them? I don’t enough about Ruby internals to know
if this is a good idea or even possible, though.

Or, if you’re just doing a kind of data dump, perhaps you could
shortcut part of ActiveRecord, access the database adapter and retrieve
the data directly (somehow). Again, I don’t know enough ActiveRecord
off the top of my head to say whether this is easily achievable or
advisable. But it’s another idea.

Chris

As a left field solution, do you have the option to send the data as a
ZIP file to the user, then have them look at it in a non-browser tool?
Excel wouldn’t work for this quantity of data (too many rows), but a
plain text editor may suffice.

As others have said, rendering this quantity of data in IE or Firefox
is probably going to be a problem.

Regards

Dave M.

On Aug 27, 2006, at 6:43 AM, eskim wrote:

It’s about 200,000 rows and 10 columns.

ugh.

i can’t speak for rails specifically, but in general there are two
places this is going to choke.

  1. the server. if the server tries to load the whole query before
    rendering it and passing it off to the client, it will have to build
    up a huge buffer. in environments where the user has more control
    over rendering, the usual solution is to render the header and then
    start reading from the database and writing out at the same time,
    then tacking the footer on the end. i do not believe this is
    possible under rails, although there may be a way i’m not aware of.

  2. the client. browsers really don’t like rendering big tables,
    since they have to have the whole table in memory before they can
    draw it. the solution to this might be to get rid of the big table:
    move to css based layout, or replace the big table with a lot of
    smaller tables (e.g. one single-row table per item). that’s still
    going to leave you with a several megabyte file, though, so the
    browser may hate you anyway.

-faisal

I decide to do it by making temporary files and then sending it to the
clients by using the function ‘render_file’.

This seems to be reasonable and take a little time but I can stand it
hopefully the clients, either :slight_smile:

Do you guys agree on this?

Many thanks for the advice.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

eskim wrote:

I decide to do it by making temporary files and then sending it to the
clients by using the function ‘render_file’.

This seems to be reasonable and take a little time but I can stand it
hopefully the clients, either :slight_smile:

Do you guys agree on this?

So you know, there is a side-effect with the mysql/ruby bindings which
causes mass memory utilization to occur if you are going to iterate over
thousands and thousands of mysql results. This happens with “fetch_hash”
is called (because that is what turns all of your C strings to ruby
Strings).

I would recommend that you evaluate if you should NOT process this many
results in one of your dispatchers but use another process that you can
start/stop/kill outside of your rails code. Your dispatchers will grow
huge (unless you don’t mind them getting killed).

Also if you constantly process large sets of different data (say you
have 10,000,000 rows and you are constantly process different sets of
that 200,000 at a time) you are NOT going to maintain consistent memory
utilization just because you’re consistently doing 200,000 records at a
time. Instead memory utilization will go up each iteration. Although it
will eventually taper off. I posted on this back earlier this year to
ruby-core. Here were some of my results:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/7463

Given the amount of memory you have allotted for you this may not be a
problem for you.

Be sure to benchmark and profile when you are working with this much
data. Ruby and Rails both make things so easy for developers, and it is
easy to fall into the trap of doing things the same when dealing with 10
records or with 100,000 records. The difference is that 100,000 records
will have a larger impact on resource utilization and you need to be
aware of how your code works with that much data being processed. It
will save you headaches in the future if someday your system runs out of
physical memory and results to using too much swap, etc.

You will also NOT get away with logic bugs or poor algorithms that
manipulate that data with large sets of data, as you might with a small
set.

You may already be aware of this, if so, keep on trucking…

Zach
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE8lrIMyx0fW1d8G0RAmTtAJ9sLDoRBpNmgP0kW65ogK1aXP28iwCfQE7t
4D2eL3cZUTIBAxqubWriWHE=
=Rq1c
-----END PGP SIGNATURE-----