Is there a way to avoid having the library slurp-read the wh

Hi,

I’m currently implementing a database application in Ruby, and I use
PostgreSQL for the data store. At first I tried using DBI to make the
data store as platform/database independent as possible, but when I came
back to work this morning I found that my computer was starved for
virtual memory, and that Ruby was using about 700MBs of RAM and rising.

What I found out is that DBI slurp reads the whole result set into
memory before it sends it back from the library. I poked around in the
code, but found no way to instead return one-and-one result serially as
the server sends them back.

So I poked around in the documentation for the postgres-library, but
sadly with no luck there either. No method there accepts a block so that
you can just receive the results serially. I also had a quick poke at
postgres-pr, but the situation seems to be the same there.

Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I’m having trouble with currently has about 2 million
rows, so reading everything into RAM is absolutely impossible. I’m
looking for something like this:

db.execute(“select * from huge_table”).each_row do |row|
# process row here
end

PS: Reading these 2 million rows is not a normal day-to-day operation,
but I do this when I normalize some data from a CSV file. Basically I
restructure the data into several tables, and for now I’ve mostly done
this with “CREATE TABLE t AS SELECT”. I’ve found that I need to compute
some of the columns with a Ruby-function though, so I basically need to
loop through them all.

Regards,
Helge E.

On 2006-08-04, Helge E. [email protected] wrote:

Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I’m having trouble with currently has about 2 million

use cursor’s. (in posrtgresql docs look for:
declare
fetch
move
close

depesz

On Fri, 4 Aug 2006, Helge E. wrote:

code, but found no way to instead return one-and-one result serially as
rows, so reading everything into RAM is absolutely impossible. I’m
some of the columns with a Ruby-function though, so I basically need to
loop through them all.

Regards,
Helge E.

it’s been a while since i used the ruby postgres bindings but, when i
did,
both ‘query’ and ‘exec’ took blocks to iterate over results sets.

have you tried?

-a

On 8/4/06, [email protected] [email protected] wrote:

it’s been a while since i used the ruby postgres bindings but, when i did,
both ‘query’ and ‘exec’ took blocks to iterate over results sets.

have you tried?

Not sure that solves his problem, Ara- he wants to keep the library
from yanking the whole result set into RAM before it starts iterating
over his code.

On Fri, 4 Aug 2006, Francis C. wrote:

On 8/4/06, [email protected] [email protected] wrote:

it’s been a while since i used the ruby postgres bindings but, when i did,
both ‘query’ and ‘exec’ took blocks to iterate over results sets.

have you tried?

Not sure that solves his problem, Ara- he wants to keep the library
from yanking the whole result set into RAM before it starts iterating
over his code.

yes - that’s what it does (used to do). looking at the source and
samples/ it
looks like it works this way now

from samples/test1.rb

res = conn.exec(“FETCH ALL in myportal”)

for fld in res.fields
printf(“%-15s”,fld)
end
printf(“\n\n”)

res.result.each do |tupl| # we iterating here
tupl.each do |fld|
printf(“%-15s”,fld)
end
printf(“\n”)
end

dunno how to do that from dbi - never used it.

regards.

-a

On Fri, Aug 04, 2006 at 04:59:36PM +0900, Helge E. wrote:

code, but found no way to instead return one-and-one result serially as
rows, so reading everything into RAM is absolutely impossible. I’m
some of the columns with a Ruby-function though, so I basically need to
loop through them all.

Regards,
Helge E.

There might be another solution: go back to the DB server, and do your
computation DB-side with an embedded procedure. This would prevent
slurping any data at all to the client and back, and if your
computation is relatively “light” might even reduce load on your
server since it doesn’t have to stream 2M rows back and forth.

Postgres has excellent support for embedded procedures in multiple
languages, even in Ruby I believe, although I’ve never done that
myself yet.

YMMV depending on your problem.

-Jürgen