IO#lineno= doesn't work the way I expected


#1

I’m working on something that operates on each line of a file
individually. So far, the relevant part looks something like this:

in_fh = File.open(infile, 'r')
out_fh = File.open(outfile, 'w')

in_fh.each_line do |l|
  out_fh << l
end

Obviously not the actual working code (since it doesn’t do anything
other
than copy the file), but it’s the simplified version I’m using to test a
feature, anyway.

This works exactly as expected. I get a duplicate of the input file,
with the name assigned to the outfile variable.

So . . . I want to be able to skip a number of lines at the beginning of
the input file. It seemed like IO#lineno= would be the obvious
solution:

in_fh = File.open(infile, 'r')
out_fh = File.open(outfile, 'w')

in_fh.lineno = 1

in_fh.each_line do |l|
  out_fh << l
end

Unfortunately, that doesn’t give me what I expected at all. In fact,
what I end up with is an empty file. Am I misunderstanding what
IO#lineno= does? Is there something about the way IO#each_line works
that is interacting badly with an incremented line number? What do I
need to do differently to make this work?


#2

Chad P. wrote:

Obviously not the actual working code (since it doesn’t do anything other
out_fh = File.open(outfile, ‘w’)
that is interacting badly with an incremented line number? What do I
need to do differently to make this work?

The way I read the ri doc for lineno=, it appears that all it does is
determine the value that lineno returns the next time you call it. That
is, it doesn’t move the read position in the file.

I think what you’re looking for is IO#seek, but notice that it seek
doesn’t operate on lines, only on byte offsets.


#3

On Sun, Nov 16, 2008 at 02:27:45AM +0900, Tim H. wrote:

The way I read the ri doc for lineno=, it appears that all it does is
determine the value that lineno returns the next time you call it. That
is, it doesn’t move the read position in the file.

I think you’re confusing lineno= with lineno (which are two separate
methods). IO#lineno does this:

ios.lineno
=> 0

IO#lineno= does this:

ios.lineno = 3
=> 3

The class is specced here, with the screen scrolled to where IO#lineno
and IO#lineno= are listed:

http://ruby-doc.org/core/classes/IO.html#M002289

I think what you’re looking for is IO#seek, but notice that it seek
doesn’t operate on lines, only on byte offsets.

That would make IO#seek not what I want, unfortunately. Line lengths
are
not known in advance.


#4

On Sat, Nov 15, 2008 at 2:54 PM, Michael G. removed_email_address@domain.invalid
wrote:

ios.lineno
http://ruby-doc.org/core/classes/IO.html#M002289
Manually sets the current line number to the given value. +$.+ is

I would think if it had the behavior you described, the second time
f.gets is called, we would see: “This is line one thousand and one\n”
not “This is line two\n”

Or maybe even “This is line one thousand\n”. I’m not sure…

Also, here is the specs for IO#lineno and IO#lineno= from RubySpec:
http://github.com/rubyspec/rubyspec/tree/master/1.8/core/io/lineno_spec.rb

HTH,
Michael G.


#5

On Sat, Nov 15, 2008 at 2:08 PM, Chad P. removed_email_address@domain.invalid
wrote:

=> 0

The description of the method is somewhat ambiguous if you ask me.

My view of the docs is inline with what Tim was describing.

------------------------------------------------------------- IO#lineno=
ios.lineno = integer => integer

 Manually sets the current line number to the given value. +$.+ is
 updated only on the next read.

    f = File.new("testfile")
    f.gets                     #=> "This is line one\n"
    $.                         #=> 1
    f.lineno = 1000
    f.lineno                   #=> 1000
    $. # lineno of last read   #=> 1
    f.gets                     #=> "This is line two\n"
    $. # lineno of last read   #=> 1001

I would think if it had the behavior you described, the second time
f.gets is called, we would see: “This is line one thousand and one\n”
not “This is line two\n”

Michael G.


#6

Chad P. wrote:

  1. Is there a way to do what I actually need – specifically, to
    iterate over an entire file except the first line, only reading one
    line at a time into RAM – without writing a C extension for Ruby?

Why not iterate over the entire file and just ignore the first line?
IO#readline or IO#gets will get you a line at a time.


#7

On Sun, Nov 16, 2008 at 04:54:01AM +0900, Michael G. wrote:

I would think if it had the behavior you described, the second time
f.gets is called, we would see: “This is line one thousand and one\n”
not “This is line two\n”

Or maybe even “This is line one thousand\n”. I’m not sure…

Okay, so . . .

  1. What the hell is the point of IO#lineno= if it does the same thing
    as IO#lineno except that it lets you specify a number that doesn’t do
    anything?

  2. Is there a way to do what I actually need – specifically, to
    iterate over an entire file except the first line, only reading one
    line at a time into RAM – without writing a C extension for Ruby?


#8

hi chad!

Chad P. [2008-11-16 01:32]:

with the second line. Are you saying I should just have an
orphan gets line in the code, then have a block in which I use
gets each line until I run out of file? I guess that would
probably work, but seems kinda . . . ugly.
not sure what tim was suggesting, but this will work:

in_fh = File.open(infile, ‘r’)
out_fh = File.open(outfile, ‘w’)

ignore first line

in_fh.gets

in_fh.each_line do |l|
out_fh << l
end

ok, it’s not pretty, but not that ugly either, is it? :wink:

cheers
jens


#9

Jens W. wrote:

line? IO#readline or IO#gets will get you a line at a time.

jens

That’s almost precisely what I was suggesting. For real code I’d want to
handle the case of the file not having any lines, if that can occur. It
doesn’t look “ugly” to me at all.


#10

On Sun, Nov 16, 2008 at 05:42:18AM +0900, Tim H. wrote:

Chad P. wrote:

  1. Is there a way to do what I actually need – specifically, to
    iterate over an entire file except the first line, only reading one
    line at a time into RAM – without writing a C extension for Ruby?

Why not iterate over the entire file and just ignore the first line?
IO#readline or IO#gets will get you a line at a time.

How do you propose I “ignore” the first line? That’s what I was trying
to do – by starting the iteration over lines in the file with the
second
line. Are you saying I should just have an orphan gets line in the
code,
then have a block in which I use gets each line until I run out of file?
I guess that would probably work, but seems kinda . . . ugly. I just
need to figure out an efficient way to calculate the number of lines in
the file (minus one) now so I can use that number to control the number
of iterations.

I really hope you aren’t suggesting I have a conditional in every single
iteration, going through all the lines in the file, to test whether it’s
the first line so the first line can be ignored.

. . . and why is there a lineno and a lineno= if lineno= doesn’t
actually
do anything other than prompt you for a useless number? I still don’t
understand that.


#11

On Nov 15, 2008, at 7:38 PM, Chad P. wrote:

I’ll go test this now to make sure it works.

It works fine and, to be clear, it has nothing to do with line
numbers. gets() reads forward to a newline character, advancing the
file pointer (represented by tell()/pos()) normally. each() works the
same way, beginning at the current file pointer location. I think
this is pretty standard I/O streaming logic in any language.

James Edward G. II


#12

On Sun, Nov 16, 2008 at 09:51:16AM +0900, Jens W. wrote:

hi chad!

Hi.

out_fh << l

end

ok, it’s not pretty, but not that ugly either, is it? :wink:

Actually, that looks pretty good, all things considered. I didn’t
realize the IO#gets would increment the line number of the file so the
each_line iterator would start on the second line. If that’s how it
works, I’m home free. Thank you for your help.

I’ll go test this now to make sure it works.


#13

Chad P. [2008-11-16 02:38]:

Actually, that looks pretty good, all things considered. I
didn’t realize the IO#gets would increment the line number of the
file so the each_line iterator would start on the second line.
maybe i should have said “throw away first line” in the comment? :wink:
because that’s just what it does. but james already gave a fine
explanation.

and wrt tim’s comment: since we’re using IO#gets there’s no need to
check for empty files. with IO#readline it’s another matter. but why
bother? did you (tim) have something else in mind?

If that’s how it works, I’m home free.
great!

Thank you for your help.
you’re welcome.

cheers
jens


#14

On 15.11.2008 20:51, Michael G. wrote:

On Sat, Nov 15, 2008 at 2:08 PM, Chad P. removed_email_address@domain.invalid wrote:

On Sun, Nov 16, 2008 at 02:27:45AM +0900, Tim H. wrote:

The way I read the ri doc for lineno=, it appears that all it does is
determine the value that lineno returns the next time you call it. That
is, it doesn’t move the read position in the file.

Exactly.

The class is specced here, with the screen scrolled to where IO#lineno
and IO#lineno= are listed:

http://ruby-doc.org/core/classes/IO.html#M002289

The description of the method is somewhat ambiguous if you ask me.

I don’t think so.

My view of the docs is inline with what Tim was describing.

Same here.

------------------------------------------------------------- IO#lineno=
ios.lineno = integer => integer

 Manually sets the current line number to the given value. +$.+ is
 updated only on the next read.

There is no talk about read position in the file - just about “current
line number”. Also:

    f = File.new("testfile")
    f.gets                     #=> "This is line one\n"
    $.                         #=> 1
    f.lineno = 1000
    f.lineno                   #=> 1000
    $. # lineno of last read   #=> 1
    f.gets                     #=> "This is line two\n"
    $. # lineno of last read   #=> 1001

The sample makes it very clear that the read position is not affected by
lineno= because file reading obviously continues at the position where
it was before.

I would think if it had the behavior you described, the second time
f.gets is called, we would see: “This is line one thousand and one\n”
not “This is line two\n”

Right (if by “you” you do not mean Tim, somehow part of the thread is
missing in Usenet).

Kind regards

robert


#15

Tim H. [2008-11-16 15:52]:

Jens W. wrote:

and wrt tim’s comment: since we’re using IO#gets there’s no
need to check for empty files. with IO#readline it’s another
matter. but why bother? did you (tim) have something else in
mind?
All I had in mind was that, when I write a piece of code that
expects a file to have at least 2 lines, then I need to think for
a second about what it should do when the file has only 1 line,
or 0 lines. It depends on the program. That’s all.
ah, ok, i see. thanks!


#16

Jens W. wrote:

and wrt tim’s comment: since we’re using IO#gets there’s no need to
check for empty files. with IO#readline it’s another matter. but why
bother? did you (tim) have something else in mind?

All I had in mind was that, when I write a piece of code that expects a
file to have at least 2 lines, then I need to think for a second about
what it should do when the file has only 1 line, or 0 lines. It depends
on the program. That’s all.


#17

On Nov 16, 2008, at 1:51 PM, Chad P. wrote:

What doesn’t make any sense to me is the idea that, for some reason,
it’s important and common enough an operation to misnumber line
numbers
that there has to be a lineno= method that counterfeits line
numbers.
What the hell is the point of that? Please explain that to me.

Now that I agree with. I have no idea why this method exits. It’s
just as easy to do:

f.lineno + 1000

James Edward G. II


#18

On Sun, Nov 16, 2008 at 08:36:50PM +0900, Robert K. wrote:

I don’t think so.
The more I look at it, the more ambiguous it appears to be.

------------------------------------------------------------- IO#lineno=
ios.lineno = integer => integer

Manually sets the current line number to the given value. +$.+ is
updated only on the next read.

There is no talk about read position in the file - just about “current
line number”. Also:

. . . which, to someone who isn’t assuming “line number” is just a
magical number plucked out of the air, makes it sound like it moves the
read position to a line whose ordinal position is that of the specified
line number. In other words, that’s how it “sounded” to me.

The sample makes it very clear that the read position is not affected by
lineno= because file reading obviously continues at the position where
it was before.

It only makes that clear if you assume a lot of things about what’s in
the file in question. I can see now, in retrospect, how you came to
that
conclusion – but the fact that the second use of f.gets returns “This
is line two\n” doesn’t necessarily mean that the return value is from
the second line of the file. I read it, initially, as meaning that
whatever line of the file it was, it just happened to say “This is line
two\n” because that made for some convenient text to have in the
example.

Since the contents of the file were not made clear in advance, the
assumption that only the second line of the file can possibly say “This
is line two\n” does not clarify anything for the reader except by
accident. It could just mean “This is the second line of output from
this code.”

I would think if it had the behavior you described, the second time
f.gets is called, we would see: “This is line one thousand and one\n”
not “This is line two\n”

Right (if by “you” you do not mean Tim, somehow part of the thread is
missing in Usenet).

I don’t see why everyone has to assume that the second line of the file
necessarily contains the text “This is line two\n”. It’s really very
ambiguous. If you want a program that outputs “This is line one\nThis
is
line two\n”, and for some reason lines 0 and 1000 of the file contain
“This is line one\n” and “This is line two\n” respectively, the
alternate
interpretation of the way the method works makes perfect sense.

What doesn’t make any sense to me is the idea that, for some reason,
it’s important and common enough an operation to misnumber line numbers
that there has to be a lineno= method that counterfeits line numbers.
What the hell is the point of that? Please explain that to me.


#19

2008/11/16 Chad P. removed_email_address@domain.invalid:

I don’t think so.

The more I look at it, the more ambiguous it appears to be.

That’s the usual effect of staring at a sentence for too long. :slight_smile:
Relax.

magical number plucked out of the air, makes it sound like it moves the
read position to a line whose ordinal position is that of the specified
line number. In other words, that’s how it “sounded” to me.

Yes, but the example makes it pretty clear that this is not the way it
is:

lineno= because file reading obviously continues at the position where
it was before.

It only makes that clear if you assume a lot of things about what’s in
the file in question. I can see now, in retrospect, how you came to that
conclusion – but the fact that the second use of f.gets returns “This
is line two\n” doesn’t necessarily mean that the return value is from
the second line of the file.

Of course not. But what sense would it make to create a file with a
different content that would return “This is line two\n” when
explaining how lineno= works? The most obvious explanation is that
someone created a file where “This is line two” is actually placed in
the second line to demonstrate the non effect on file position.

I read it, initially, as meaning that
whatever line of the file it was, it just happened to say “This is line
two\n” because that made for some convenient text to have in the example.

Actually I believe the other interpretation is much more
straightforward and reasonable.

Since the contents of the file were not made clear in advance, the
assumption that only the second line of the file can possibly say “This
is line two\n” does not clarify anything for the reader except by
accident. It could just mean “This is the second line of output from
this code.”

See above. IMHO only a bit application of common sense will show you
that your reasoning goes a bit astray here - although from a formal
point of view you are right.

I would think if it had the behavior you described, the second time
f.gets is called, we would see: “This is line one thousand and one\n”
not “This is line two\n”

Right (if by “you” you do not mean Tim, somehow part of the thread is
missing in Usenet).

I don’t see why everyone has to assume that the second line of the file
necessarily contains the text “This is line two\n”. It’s really very
ambiguous.

… for you.

If you want a program that outputs “This is line one\nThis is
line two\n”, and for some reason lines 0 and 1000 of the file contain
“This is line one\n” and “This is line two\n” respectively, the alternate
interpretation of the way the method works makes perfect sense.

Formally speaking yes, with a bit of common sense, no.

What doesn’t make any sense to me is the idea that, for some reason,
it’s important and common enough an operation to misnumber line numbers
that there has to be a lineno= method that counterfeits line numbers.
What the hell is the point of that? Please explain that to me.

I do not know this. IO#lineno= can help implementing ARGF although it
is not needed. Maybe ARGF is completely implemented in Ruby and
delegates to C code for the IO handling - including line counting. In
that case it’s handy to have this setter so you can offset the line
number of the next opened file.

Kind regards

robert


#20

2008/11/17 Chad P. removed_email_address@domain.invalid:

[…]

Chad, let’s agree to disagree and leave it at that.

robert