Write to specified line number of file

Hi,

I want to read a file, and write to another file by randomize the lines.
I saw the method File.lineno= but it doesn’t seem to work. I just get
the same file written as the original, without the lines having been
randomized.

I suppose it should be simple, but am new to Ruby. Any help would be
great.

Thanks,

Asitya

  1. How big is the file?

  2. Do you want sampling without replacement (shuffle the original file
    keeping the lines intact) or sampling with replacement (n lines randomly
    chosen from the file)?

I’m going to assume that 1 is “too big for a considerate programmer to
read all into memory on a shared machine” and 2 is “without replacement
(shuffling)”. I’m also going to assume that you’re on some form of UNIX
machine that has the “sort” verb. For Windows, that could mean CygWin.

So what you want to do is make a copy of the file with random numbers
tacked on to the front of each line. Then sort the tagged copy
numerically using the external “sort” verb, and remove the tags from the
sorted copy. You’ll be doing everything in Ruby except the sort.

If the answer to 1 is “small enough to fit into memory”, just read the
file into memory, tag the lines with random numbers, and use a Ruby
“sort” to do the sorting, then untag the lines and write out the file.
You’ll be doing everything in Ruby.

By the way, I do this sort of thing rather often. The files in question
are data files that drive performance test scripts. They’re small (under
65536 lines), so I just read them into Excel, tack on a random column,
sort on the random column, delete the random column, and write the file
back out.

If you want sampling with replacement, the easiest way to do it is
using R. I don’t know how to do it in Ruby or Excel, since I have R. :slight_smile:

I think the “too big/shuffled” case would make an interesting Ruby quiz,
if you rule out the external sort verb as “cheating”.

M. Edward (Ed) Borasky

Thanks for the reply. That’ll work. Ofcourse, using excel is the
straightforward way but I need to do it in Ruby.

I still think there must be a simple way to move to a specified line in
a file. I could write a function which does that by checking for “\n” at
the end of each line, and will do that unless someone can suggest an
easier method soon.

Aditya R. wrote:

Posted via http://www.ruby-forum.com/.
From one newbie to another.

Assuming that the file can be read into memory and you want to shuffle
the lines around.

read in the file and place the lines in an array

and use a shuffle method such:

def shuffle(ar)
stop_line = line_b = ar.size - 1

           # by time were here, the lines are a scramble as they

going to get
stop_line /= 5

stop_line.upto(ar.size - 1) do
line_a = rand( line_b )
# exchange first and last line
ar[line_a], ar[line_b] = ar[line_b], ar[line_a]
line_b -= 1
end
return ar
end

then you write back the lines into a new file or simply overwrite the
old file.

I think you misunderstand what IO#lineno and IO#lineno= do. The first
one returns the number of lines read from the IO stream so far
(probably only works if doing line oriented io, i.e. with gets or
each), and IIO#lineno= set the base number to start from. Setting
lineno does NOT change the position in the stream, it just change what
the current counter is:

echo “toto” | ruby -e ‘p $stdin.lineno; $stdin.lineno = 100; p
$stdin.lineno; p $stdin.gets; p $stdin.lineno’
0
100
“toto\n”
101

The easiest is to swallow the file with IO#readlines, sort them at
random with Array#sort_by and write them out. In a one liner:

ruby -e ‘$stdout.print($stdin.readlines.sort_by { rand })’ <
file_to_scramble

If the file is really to big to swallow, it can become more
complicated. But there is no need to get there in most cases.

Hope this help,
Guillaume.

Aditya R. wrote:
[…]

I still think there must be a simple way to move to a specified line in
a file. I could write a function which does that by checking for “\n” at
the end of each line, and will do that unless someone can suggest an
easier method soon.

Files don’t consist of lines, files are ordered sequences of octets.
Lines are just an abstraction imposed on files by either the IO
libraries and the language you’re using, or your editor.

Unless the lines in your file are all of the same length it’s going to
be non-trivial to move them around in place in the file. You’ll either
need to move the existing contents “up” (if the new line were shorter)
or move them “back” (if it’s longer).

Having said that, you could accomplish your randomizing the file by
something along these lines:

  • read the file line by line and store the offset of the beginning
    character of each line (using IO#tell and subtracting the length of the
    line)

  • open a new temporary output file

  • randomly pick a line number and use IO#seek to jump to the appropriate
    offset

  • read the line and print it to the temp file; remove that line’s info
    from your list of offsets

  • when there’s no more offsets left, close the temp file and File#rename
    it over the original