ActiveRecord, PostgreSQL question

I’m using ActiveRecord’s connection to execute a custom query (which
runs fine in psql) like this:

result = ActiveRecord::Base.connection.execute(“COPY (select * from users) TO STDOUT WITH CSV;”)
SQL (0.8ms) COPY (select * from users) TO STDOUT WITH CSV;
=> #PGresult:0x2589ad8

result.nfields
=> 39

result.ntuples
=> 0

An instance of PGresult is returned. How can I get data out of it?

Thanks!

=> 0

An instance of PGresult is returned. How can I get data out of it?

Load up the rdocs for the postgres gem… it will tell you… but why
do it this way? Why not use AR to get your records and fastercsv to
convert it to CSV? Much more portable…

Thanks Philip,

The Postgres gem rdoc hasn’t been much help. All of the PGresult
instance methods that retrieve values require a tuple number. My
PGresult has a number of fields, but not tuples (rows).

This query (when run in psql) returns a large block of text. For
example:
email,fname,lname,created_at
[email protected],Foo,Fooster,2009-07-07 17:00:41.929865
[email protected],Bar,Barbie,2009-07-01 20:31:08.659965
[email protected],User,User,2009-07-07 20:33:53.293606
[email protected],Admin,Admin,2009-07-07 20:33:53.760538

I agree 100% that using FasterCSV and ActiveRecord to pull the data is
much more portable (and elegant). In fact, that’s how I’m doing it
now. However, this is a huge dataset that is causing server timeouts
and hogs memory. I’m investigating csv generation in Postgres as it
takes a fraction of the time and resources because each object isn’t
getting instantiated.

Any thoughts?

I found a temporary work-around where I ask PostgreSQL to save the
output in a file instead of stdout…

sql = “COPY (select * from users) TO ‘/tmp/file.csv’ WITH CSV HEADER;”
=> “COPY (select * from users) TO ‘/tmp/file.csv’ WITH CSV HEADER;”

ActiveRecord::Base.connection.execute(sql)
SQL (0.2ms) SET client_min_messages TO ‘panic’
SQL (0.1ms) SET client_min_messages TO ‘notice’
SQL (2.6ms) COPY (select * from users) TO ‘/tmp/file.csv’ WITH CSV
HEADER;
=> #PGresult:0x2650188

csv_string = File.read ‘/tmp/file.csv’
=> “email,fname,lname,created_at \[email protected],Foo,Fooster,
2009-07-07 17:00:41.929865\[email protected],Bar,Barbie,2009-07-01
20:31:08.659965\[email protected],User,User,2009-07-07
20:33:53.293606\[email protected],Admin,Admin,2009-07-07
20:33:53.760538\n”

This is not ideal, but it’s a start.

Any other ideas?

On Jul 10, 2009, at 1:27 PM, gsterndale wrote:

[email protected],Foo,Fooster,2009-07-07 17:00:41.929865

Any thoughts?

Skip AR and use the select_values() call perhaps. That will skip any
object instantiation at least.

Hi Can you try like

The following can be moved to lib

module ActiveRecord
class Base
def self.to_csv(sql_statement)
self.connection.instance_variable_get(:@connection).exec(“copy
(#{sql_statement}) TO STDOUT WITH CSV”)
csv = []
while (line = pg_conn.getline) != “\.”
csv << line
end
return csv.join("\n")
end
end
end

And can access like ActiveRecord::Base.to_csv(sql_here)

Sijo

On Friday 10 July 2009, gsterndale wrote:

I agree 100% that using FasterCSV and ActiveRecord to pull the data
is much more portable (and elegant). In fact, that’s how I’m doing it
now. However, this is a huge dataset that is causing server timeouts
and hogs memory. I’m investigating csv generation in Postgres as it
takes a fraction of the time and resources because each object isn’t
getting instantiated.

Any thoughts?

If your dataset is truly is huge an approach that relies on first
extracting the entire dataset and then serving it is not going to work
reliably. If it doesn’t fail outright due to a timeout, it is still
going to strain your memory considerably.

As far as I can see, there are two ways around this.

If the export doesn’t have to be current to this very instance, set up a
background job that exports it regularly to a file and leave the serving
to the web server.

If that doesn’t meet your needs, consider streaming the data. Have a
look at the send_data (controller) method. It may even be sensible to
handle this in Metal, i.e. a Rack module. I haven’t tried any of this
myself, the following link might be helpful

http://amberbit.com/blog/2009/04/15/ruby-flv-pseudostreaming-
implemented-using-sinatra-and-rack-evil-useful-for-rails-too/

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

Thanks Sijo,

I ended up using something very similar:

pg_conn = self.connection.instance_variable_get(:@connection)
pg_conn.exec(“COPY (#{sql}) TO STDOUT WITH CSV HEADER”)
csv = []
while (line = pg_conn.getline) != “\.”
csv << line
end
pg_conn.endcopy
csv.join("\n")

It’s a hack but it works for now.