Reading String Data as a File

I use Net::HTTP to collect some data as a string. I now need to pass
that string data to a Ruby method that is expecting to receive the data
from a file (i.e., the method expects the data to be stored in a file
and to have a path to the file passed to it as a parameter). Is there
anyway to resolve this dilemma short of writing the string data to a
file and then reading it in from the file?

Thanks for any input.

  ... doug

On Jun 28, 2010, at 17:43 , Doug J. wrote:

I use Net::HTTP to collect some data as a string. I now need to pass
that string data to a Ruby method that is expecting to receive the data
from a file (i.e., the method expects the data to be stored in a file
and to have a path to the file passed to it as a parameter). Is there
anyway to resolve this dilemma short of writing the string data to a
file and then reading it in from the file?

ri StringIO

Ryan D. wrote:

ri StringIO

That will work if the code in question will accept an open File/IO
object as an argument.

If it takes only a pathname argument, then you’re stuck with writing the
data to a file (ri Tempfile may help).

If you have control of the target code, then refactor it. e.g.

class Foo

original entry point

def read_file(pathname)
File.open(pathname,“rb”) { |f| read_io(f) }
end

entry point for already-open object, e.g. STDIN, a StringIO etc.

def read_io(io)
io.each_line { … }
end
end

If it takes only a pathname argument, then you’re
stuck with writing the data to a file

Unfortunately that is precisely my case and that is precisely what I was
trying to avoid. (And, unfortunately, I don’t have any control over the
target code.)

Interestingly, a post that I found seemed to say that I could use the
StringIO approach in the case where a pathname argument was required.
The post said:

Any easy way to work with a string in a method that is expecting
a file is to create a new StringIO object and pass the result to
the method requiring a file type. For example:

some_method(StringIO.new(“Your string here”))

He did say, “file”. It’s just that usually methods that follow that
form are expecting a path. Anyway, as one might expect, it didn’t work
for me. I get the following error:

./test1:5:in `read’: can’t convert StringIO into String (TypeError)

As Ryan says, I guess that I’m stuck to write this out to a temp file.

Thanks to all who responded to my inquiry.

       ... doug

On Tue, Jun 29, 2010 at 11:50 AM, Doug J. [email protected]
wrote:

Unfortunately that is precisely my case and that is precisely what I was
trying to avoid. (And, unfortunately, I don’t have any control over the
target code.)

It’s Ruby. You can always patch or alias_method_chain the target code
if
you’re willing to bear some slight brittleness.

It’s Ruby. You can always patch or alias_method_chain the target code
if you’re willing to bear some slight brittleness.

Good point. I’ve been considering whether I should re-think my position
that the underlying code is inaccessible. The truth is, the block of
data that I have in memory is actually a Rails layout. I was reluctant
to mention the Rails aspects in this forum. So, I don’t know if I could
ever figure out what would need to be done; but, your idea is definitely
a good one. Thanks for the input.

  ... doug

On Jun 29, 2010, at 14:20 , Tony A. wrote:

On Tue, Jun 29, 2010 at 11:50 AM, Doug J. [email protected] wrote:

Unfortunately that is precisely my case and that is precisely what I was
trying to avoid. (And, unfortunately, I don’t have any control over the
target code.)

It’s Ruby. You can always patch or alias_method_chain the target code if
you’re willing to bear some slight brittleness.

That is EXACTLY what I was coming back to say… Tony beat me to it.

Robert K. wrote:

It’s Ruby. �You can always patch or alias_method_chain the target code if
you’re willing to bear some slight brittleness.

Is this always possible? Wouldn’t you need some knowledge of the
inner workings of the target code? In this case for example, does it
open the file with File.open or maybe with File.foreach?

You simply find that part of the code, and replace the offending
method(s) with something else. In the limit, you replace everything with
your own code :slight_smile:

It would be convenient to be able to mock out File and Dir with a
virtual, in-RAM filesystem. I’m not aware of a library which does that,
but in principle I think it could be done.

This is an interesting point of interface design: usually it is more
convenient to just pass a file name somewhere and that method opens
the file (or URL) and reads the data. But from a modularity point of
view it is generally better to pass an open IO like instance.

Definitely. The original csv.rb in ruby 1.8 got this very badly wrong.

The new (faster_csv) interface is capable of this, but it suffers from
missing documentation. IIRR you have to do something like

FasterCSV.new($stdin).each do |row|
p row
end

Since the documented “primary” interface is
FasterCSV.foreach(“path/to/file.csv”), you have to dig through the code
to work out how to handle an open stream.

2010/6/29 Tony A. [email protected]:

On Tue, Jun 29, 2010 at 11:50 AM, Doug J. [email protected] wrote:

Unfortunately that is precisely my case and that is precisely what I was
trying to avoid. (And, unfortunately, I don’t have any control over the
target code.)

It’s Ruby. You can always patch or alias_method_chain the target code if
you’re willing to bear some slight brittleness.

Is this always possible? Wouldn’t you need some knowledge of the
inner workings of the target code? In this case for example, does it
open the file with File.open or maybe with File.foreach?

This is an interesting point of interface design: usually it is more
convenient to just pass a file name somewhere and that method opens
the file (or URL) and reads the data. But from a modularity point of
view it is generally better to pass an open IO like instance.

You can nicely layer this e.g.

class X

convenience method that will open the file for you

def read_file(path)
File.open path |io|
read io
end
end

yet another convenience method

def read_url(url)

end

read the data

def read(io)
io.each_line do |line|
# whatever
end
end
end

The only drawback here is the additional method needed but convenience
comes at a price. :slight_smile:

Kind regards

robert

Robert K. wrote:

At least one can use Tempfile for this, e.g.

Tempfile “prefix”, “/tmp” do |io|
io.write everything

io.seek 0
whatever_load_routine io
end

or rather:

Tempfile.open “prefix”, “/tmp” do |io|
io.write everything
io.flush
whatever_load_routine io.path
end

2010/6/30 Brian C. [email protected]:

your own code :slight_smile:
That’s what I always wanted to do - seems I have to resurrect my
WorldDomination gem. :slight_smile:

It would be convenient to be able to mock out File and Dir with a
virtual, in-RAM filesystem. I’m not aware of a library which does that,
but in principle I think it could be done.

Well, /tmp is in memory on many systems and writing a small file is
also a mostly in memory operation. Of course, this is not as cheap as
doing it completely in userland but probably sufficient for many
applications (although it’s not really nice). At least one can use
Tempfile for this, e.g.

Tempfile “prefix”, “/tmp” do |io|
io.write everything

io.seek 0
whatever_load_routine io
end

FasterCSV.new($stdin).each do |row|
 p row
end

Since the documented “primary” interface is
FasterCSV.foreach(“path/to/file.csv”), you have to dig through the code
to work out how to handle an open stream.

Or have the idea to look at “ri CSV.new”…

Thanks for the hint. This is good to know.

Cheers

robert

On Jun 30, 2010, at 7:53 AM, Brian C. wrote:

Robert K. wrote:

This is an interesting point of interface design: usually it is more
convenient to just pass a file name somewhere and that method opens
the file (or URL) and reads the data. But from a modularity point of
view it is generally better to pass an open IO like instance.

Definitely. The original csv.rb in ruby 1.8 got this very badly wrong.

The new (faster_csv) interface is capable of this, but it suffers from
missing documentation.

I agree that FasterCSV’s documentation isn’t perfect. I’m pretty sure
all of its functions are documented, but you would need to read the API
like a novel to find them. I’ve been trying more tutorial style
documentation lately, but there again it’s hard to reference what you
specifically want to know.

I’m open to suggestions and I do take patches.

IIRR you have to do something like

FasterCSV.new($stdin).each do |row|
p row
end

That works, yes.

Since the documented “primary” interface is
FasterCSV.foreach(“path/to/file.csv”), you have to dig through the code
to work out how to handle an open stream.

That’s mostly due to a pet peeve of mine. I often see code that slurps
when foreach() would have worked fine. That’s why I try to push that as
a first choice.

Do you think it would help if I added Wrapping an IO under the Shortcut
Interface on this page?

http://fastercsv.rubyforge.org/classes/FasterCSV.html

James Edward G. II

2010/6/30 Brian C. [email protected]:

or rather:

Tempfile.open “prefix”, “/tmp” do |io|
io.write everything
io.flush

I’d rather io.close instead of io.flush to release resources as soon
as possible.

whatever_load_routine io.path
end

Ooops! Yes, of course. I copied the wrong example. Sorry for my
confusion.

Cheers

robert

James Edward G. II wrote:

I’m open to suggestions and I do take patches.

Specifically, I’d like to see how to parse CSV from stdin. You provide
an example in the opposite direction:

FCSV($stderr) { |csv_err| csv_err << %w{my data here} } # to

$stderr

A bit more experimentation suggests that

FCSV($stdin).each { |a,b,c| p a,b,c }

works, so if that’s a reasonable way to drive the library, I’d like to
see that mentioned under shortcuts. (I thought I’d tried that before and
it failed, but I must have done something different)

Robert K. wrote:

2010/6/30 Brian C. [email protected]:

or rather:

Tempfile.open “prefix”, “/tmp” do |io|
�io.write everything
�io.flush

I’d rather io.close instead of io.flush to release resources as soon
as possible.

But tempfile will want to close itself using the block form anyway.

In most versions of ruby, Tempfile with a block returns nil. A change
was committed so that it returns the (closed) object, but that hasn’t
made it into either of the versions I have lying around here.

tf = Tempfile.open(“aaa”,“/tmp”) { puts “hello”; 123 }
hello
=> nil

On 30.06.2010 17:31, Brian C. wrote:

But tempfile will want to close itself using the block form anyway.

Yes, but later. This can make a difference if you are low on file
descriptors. And you do not risk weird effects by the same process
opening the file twice.

In most versions of ruby, Tempfile with a block returns nil. A change
was committed so that it returns the (closed) object, but that hasn’t
made it into either of the versions I have lying around here.

tf = Tempfile.open(“aaa”,"/tmp") { puts “hello”; 123 }
hello
=> nil

The non block form obviously returns the Tempfile instance and if you
want it to be returned from the block what stops you from explicitly
returning it?

IMHO the method with block should return whatever the implementor of the
block chooses. That is far more reusable than always returning the
Tempfile. Most of the time the Tempfile instance is of no use anyway
since it is closed then.

Kind regards

robert

On 30.06.2010 17:05, James Edward G. II wrote:

That’s mostly due to a pet peeve of mine. I often see code that
slurps when foreach() would have worked fine. That’s why I try to
push that as a first choice.

I wholeheartedly agree.

Do you think it would help if I added Wrapping an IO under the
Shortcut Interface on this page?

http://fastercsv.rubyforge.org/classes/FasterCSV.html

+1

robert

Robert K. wrote:

The non block form obviously returns the Tempfile instance and if you
want it to be returned from the block what stops you from explicitly
returning it?

Only that it’s a bit verbose:

tf = nil
Tempfile.open(…) do |io|
tf = io

end
puts tf.path

http://redmine.ruby-lang.org/issues/show/504

IMHO the method with block should return whatever the implementor of the
block chooses. That is far more reusable than always returning the
Tempfile.

Maybe that’s what the accepted patch does - I haven’t tested it. It
would be consistent with File.open { … } if it worked that way.

Anyway, I think we’re talking about minutiae. You say that one should
close the file at the earliest opportunity to “save resources”, but the
only resource we’re talking about is one slot in the kernel file
descriptor table, and most apps aren’t going to be constrained by that.

On 30.06.2010 20:26, Brian C. wrote:

 ...

end
puts tf.path

No, I was talking about the other version which returns whatever the
block returns. You would do

tf = Tempfile.open(…) do |io|

io
end
puts tf.path

http://redmine.ruby-lang.org/issues/show/504

Apparently people differ in their preferences.

IMHO the method with block should return whatever the implementor of the
block chooses. That is far more reusable than always returning the
Tempfile.

Maybe that’s what the accepted patch does - I haven’t tested it. It
would be consistent with File.open { … } if it worked that way.

Exactly that is what the patch does:

http://redmine.ruby-lang.org/repositories/diff/ruby-19?rev=19454

Anyway, I think we’re talking about minutiae. You say that one should
close the file at the earliest opportunity to “save resources”, but the
only resource we’re talking about is one slot in the kernel file
descriptor table, and most apps aren’t going to be constrained by that.

That’s true. But I also have seen issues caused by files being opened
more than once by the same process. Plus, you’ll notice much faster if
you try to write to the tempfile after you thought you were done when
you close the file. If the code is more complicated these bugs can be
hard to track. How much simpler is it if you see this:

irb(main):006:0> Tempfile.open “x” do |io|
irb(main):007:1* p io
irb(main):008:1> io.puts “hello”
irb(main):009:1> io.close
irb(main):010:1> io.puts “world”
irb(main):011:1> end
#<File:C:/Users/Robert/x20100630-4456-1ixsj0i-0>
IOError: closed stream
from (irb):10:in block in irb_binding' from /usr/local/lib/ruby19/1.9.1/tempfile.rb:199:in open’
from (irb):6
from /usr/local/bin/irb19:12:in `’

It’s probably not that big a deal but I believe such discussions bring
benefit to the community by presenting alternative solutions to a
problem along with arguments. I always like this food for thought.
Thanks for sharing your thoughts!

Kind regards

robert

On Jun 30, 2010, at 10:41 AM, Brian C. wrote:

James Edward G. II wrote:

I’m open to suggestions and I do take patches.

Specifically, I’d like to see how to parse CSV from stdin. You provide
an example in the opposite direction:

FCSV($stderr) { |csv_err| csv_err << %w{my data here} } # to

$stderr

On Jun 30, 2010, at 11:35 AM, Robert K. wrote:

On 30.06.2010 17:05, James Edward G. II wrote:

Do you think it would help if I added Wrapping an IO under the
Shortcut Interface on this page?

http://fastercsv.rubyforge.org/classes/FasterCSV.html

+1

Better?

http://fastercsv.rubyforge.org/classes/FasterCSV.html

James Edward G. II