Pipe binary data

Hi there,
I have a little program that mines the web. It consists of two ruby
programs. The first is responsible to get the data from the web; the
second is responsible of mining that data. I connect the two via
command line redirection/ pipe (the first executes stdin.puts data; the
second calls data = stdin.gets)

This works just fine when I access html pages. However, when I access
binary data, at times, the pipe misbehaves. I encountered one jpg file
that causes a crash of the consuming data mining program on linux; and
a infinite loop (stdin.gets just returns nil after a while without
having read the entire data) on windows.

I suppose what I have to do is encode the data before I place it on the
pipe and decode it after I consume it from the pipe. Any suggestions?



Hi Christian,

I’m no expert by any means, but this sounds odd to me – more like
something is broken with your scripts rather than the piping. The
reason I say so is because I can pipe /dev/random and /dev/dsp to a
file without any problem; you’d think that I’d hit the same problem at
some point given the (pseudo-)random nature of those pipes; but I never


Jordan, some more digging revealed this was related to a parsing bug in
REXML. It somehow didnt recognize a closing tag…so it wasnt related
to the pipe afterall.