Evading the limit of a pipe's standard input

Ashley_M · August 29, 2006, 9:18pm

Hi

I’m trying to write a tool that generates a really long SQL script
and passes it to psql, but I’ve hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I’ve run into the limit so it had me scratching my head for a while.
I’ve tried a load of tricks, even putting the lines in an array, eg:

MAX_SUCCESSFUL_TIMES = 3047

query = [“BEGIN WORK;”]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << “INSERT INTO people (name) VALUES (‘Fred’);”
end
query << “COMMIT;”

IO.popen(“psql -U test test”,“r+”) do |pipe|
query.each { |statement| pipe.puts statement }
end

but it still fails when the total length of commands exceeds the
limit (which by experiment I’ve found to be 128K on Mac OS X, hence
the specific number of times above).

What’s the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Thanks
Ashley

Ashley_M · August 29, 2006, 10:06pm

too long to be sent to the stdin of psql. This is the first time
query << “COMMIT;”
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe’s. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < “your big file”

Hope I am not too off from what you meant,
Gennady.

Ashley_M · August 30, 2006, 12:53am

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe’s. To check it, try to feed
the big file to psql via redirection, like in:

It’s not, psql is commonly used to restore whole databases by feeding
it data from stdin.

Ashley_M · August 30, 2006, 2:31am

On Wed, 30 Aug 2006 04:16:35 +0900, Ashley M. wrote:

query = [“BEGIN WORK;”]
limit (which by experiment I’ve found to be 128K on Mac OS X, hence
the specific number of times above).

What’s the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Rewriting to DBI is not very difficult, so unless you have a reason
other
than not wanting DBI calls to clutter your code or not wanting to
massively restructure your code, the following should work:

class DBI::DatabaseHandle
#Takes a whole SQL script in a string
#and executes it on the database.
def batch(sql)
sql=sql.split(";").delete_at(-1)
sql.each{ |statement| self.do(statement) }
end
end

in the sample code you gave above,
query.each {|statement|dbh.do(statement)}
should also work just fine.

Is it possible with the pipe solution that you need to read back the
output data so that pgsql doesn’t back up its pipe buffer as Grennady
Bystritsky has suggested?

Ashley_M · August 30, 2006, 9:36am

On Tuesday 29 August 2006 21:06, Gennady B. wrote:

psql -U test test < “your big file”

Hope I am not too off from what you meant,
Gennady.

Hi Gennady

I was not very clear in my e-mail, I didn’t think psql was limited in
the size
of file it can read. You were right it was just the buffer filling up.
I
had actually tried reading from the buffer but not in parallel. This
seems
to work:

MAX_SUCCESSFUL_TIMES = 3047

query = [“BEGIN WORK;”]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << “INSERT INTO people (name) VALUES (‘Fred’);”
end
query << “COMMIT;”

IO.popen(“psql -U test test”,“r+”) do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

I assume pipe.read blocks, so the thread doesn’t turn the CPU into a
heating
element.

Cheers
Ashley

Ashley_M · August 30, 2006, 10:02am

On Wednesday 30 August 2006 08:34, Ashley M. wrote:

IO.popen(“psql -U test test”,“r+”) do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

Actually I lied… I need a “sleep 2” at the end of the IO.popen block
for
this to work. Is there a less magical way to sync it? (I took a stab
at
Process.wait but it doesn’t work.)

Ashley

Ashley_M · August 30, 2006, 5:47pm

On Wednesday 30 August 2006 14:10, Ken B. wrote:

Since you’re throwing away the output, instead of using a thread, why try
one of the following:

popen with the mode “w”
IO.popen(“psql -U test test”,“w”)

redirect the results from pgsql to /dev/null inside the popen call
IO.popen(“psql -U test test > /dev/null”,“r+”)

–Ken

Thanks Ken the last one was what I was looking for. I forgot to say
that ‘IO.popen(“psql -U test test”,“w”)’ was actually one of the first
things
I tried, but it dumps the (several thousand lines of) psql results back
into
the shell, which renders the output of the script useless.

Ashley

Ashley_M · August 30, 2006, 3:11pm

On Wed, 30 Aug 2006 17:02:08 +0900, Ashley M. wrote:

Ashley

Since you’re throwing away the output, instead of using a thread, why
try
one of the following:

popen with the mode “w”
IO.popen(“psql -U test test”,“w”)
redirect the results from pgsql to /dev/null inside the popen call
IO.popen(“psql -U test test >
/dev/null”,“r+”)

–Ken