Idiomatic file snarf

I want to open a file, suck the contexts into a variable and close it
again. I suppose I could IO#each and glue the lines together.

Currently I have

   f = File.new(fname)
   text = f.read
   f.close

Surely there’s an idiomatic Ruby one-liner? I must be looking in the
wrong place @ruby-doc.org -Tim

On Sep 27, 2006, at 5:02 PM, Tim B. wrote:

I want to open a file, suck the contexts into a variable and close
it again. I suppose I could IO#each and glue the lines together.

Currently I have

  f = File.new(fname)
  text = f.read
  f.close

Surely there’s an idiomatic Ruby one-liner?

Sure:

text = File.read(fname)

James Edward G. II

On Thu, 28 Sep 2006 07:02:13 +0900 Tim B. [email protected]
wrote:

the wrong place @ruby-doc.org -Tim
some_variable = File.new(fname).readlines

– Thomas A.

On 2006.09.28 07:02, Tim B. wrote:

I want to open a file, suck the contexts into a variable and close it
again. I suppose I could IO#each and glue the lines together.

The ‘correct’ way is:

text = File.read file_name

Currently I have

  f = File.new(fname)
  text = f.read
  f.close

A better way for the above is this:

text = File.open(file_name) {|f| f.read}

Eero S. wrote:

On 2006.09.28 07:02, Tim B. wrote:

Currently I have
f = File.new(fname)
text = f.read
f.close
A better way for the above is this:
text = File.open(file_name) {|f| f.read}
To expand on that: the form that takes a block automatically calls
f.close when the block is done. What’s more, it does it in an ensure
clause
, you scr1pt k1ddie.

Also, Kernel#open exists. So: text = open(file_name) {|f| f.read}

What’s more, if you’re dealing with files on a linely basis,
File.include? Enumerable, so you can do fun things like:
matches = open(file_name) {|f| f.grep(/juicy stuff/)}
To return an Array of matching lines without having to bring the whole
file into memory. (Though, if you don’t care about memory,
File.readlines(file_name).grep /juicy stuff/ is prettier.)

Devin

On Thu, 28 Sep 2006, James Edward G. II wrote:

Surely there’s an idiomatic Ruby one-liner?

Sure:

text = File.read(fname)

Actually, the idiom I most use is

File.read( fname).scan( %r{ juicy stuff}x) do |match|

do something with juicy stuff

end

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

“We have more to fear from
The Bungling of the Incompetent
Than from the Machinations of the Wicked.” (source unknown)

On Thu, 28 Sep 2006, John C. wrote:

Actually, the idiom I most use is

File.read( fname).scan( %r{ juicy stuff}x) do |match|

do something with juicy stuff

end

Just remembered, I have an old RCR lying around on this.
Don’t forget to vote for…
RCR 332: mmap'd version of IO.scan( file_name, regexp)

Currently there exists two very useful functions in ruby.

IO.read( file_name) reads in the entire file into a string.

string.scan( regexp){|match| } scans the entire string for regexp
yielding matches.

The limit on doing…

IO.read(file_name).scan( regexp)

is the size of your machines unused physical memory.

Unix has the very handy facility called mmap that allows one to memory
map an entire file and the contents of that file appears mapped into
your virtual address space.

The operating system handles all the fuss and bother of reading (and
forgetting) pages of that file into memory.

Thus is would be very easy to create a mmap’d version, semantically the
same as the following function…

def IO.scan( file_name, regexp, &block)
IO.read(file_name).scan( regexp, &block)
end

But being mmap’d could handle files up (almost) up to 4GB in size.

Problem

IO.read(file_name).scan(regexp) is limited to the available physical
memory on your system.

Proposal
Reimplement…

def IO.scan( file_name, regexp, &block)
IO.read(file_name).scan( regexp, &block)
end

to use unix mmap.

Analysis

No language level change, merely an extension to the existing IO.c
Implementation

Here is some example code.

http://www.cs.purdue.edu/homes/fahmy/cs503/mmap.txt

Where they do the second mmap and the memcpy, we would do the regexp
scan.

So that would have to be mashed together with io_read in io.c and
rb_str_scan in string.c

Hmm. Just thinking. Before STL existed I did my own template library in
C++. One of the most useful features was I could mmap a string to a file
and thereafter the entire file behaved as an ordinary string.

The alternate to this RCR would be something that hacked the internal
representation of a ruby string so that the data pointed to was mmap’d.

Now I can think of many uses for that.

However, that would be a far harsher change on the string class and GC
system. Thinking on that a bit more.

One of the Grand Unifying Principles of Unix is…

“Everything (graphics card, directories, sockets, network cards, …)
is a file, and a File is just a stream of Bytes.”

Repeat that until it’s firmly stuck in your head.

Now take one small step further.

A stream of bytes is just a (possibly mmap’d) String.

Doesn’t that make life really really simple?

Existing implementations!

Similar idea discuss here…
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/7673

Implementation for Unix here… http://moulon.inra.fr/ruby/mmap.html

Implementation for Win32 here…
http://rubyforge.org/projects/win32utils/

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

“We have more to fear from
The Bungling of the Incompetent
Than from the Machinations of the Wicked.” (source unknown)