Why does IO.readlines() keep newlines?

At the very least, the win32 implementation of Ruby’s IO.readlines()
method keeps the newline character on each string in the array.
Considering
that it is the newline that defines a “line,” it would not be wholly
unreasonable to omit it from the array, returned. I would have imagined
that it was implemented using String.split(), which omits the splitting
character. On a simply practical note, I’m sure the former is more
popular
than the latter in the following:

out = File.open(‘file.txt’, ‘r’){|file| file.readlines.collect{|line|
line.chomp}}
out = File.open(‘file.txt’, ‘r’){|file|
}

...in that rarely do people actually want newlines in their strings.
Interestingly enough, I discovered this behaviour from a bug in a

program which was hidden by another peculiar function, puts(). Can you
imagine my surprise that puts() not only appends a newline to a string
printed to stdout but, if a newline already exists, it doesn’t bother
appending one! So, printing strings with puts() can hide whether
strings
have a newline or not. Weird…
So, who thinks my suggested change is a good idea? How do I go
about
popularizing my opinion?
Thank you…

On Nov 19, 2007 1:15 PM, Just Another Victim of the Ambient M.
[email protected] wrote:

line.chomp}}
have a newline or not. Weird…
So, who thinks my suggested change is a good idea? How do I go about
popularizing my opinion?
Thank you…

I’m going to speculate that readlines does this because of operating
system differences in line endings.
For compatibility between most systems, it would have to remove line
feeds (\x0A) or line-feed/carriage return combinations (\x0D\x0A).

I personally rather prefer the current behavior of readline. I don’t
think puts matters, and is certainly not worth changing. I’m aware of
their behavior and if it matters, I code accordingly.

humbly,
Daniel Brumbaugh K.

On Nov 20, 2007, at 12:43 AM, Daniel Brumbaugh K. wrote:

I’m going to speculate that readlines does this because of operating
system differences in line endings.
For compatibility between most systems, it would have to remove line
feeds (\x0A) or line-feed/carriage return combinations (\x0D\x0A).

Indeed that’s not the case.

In CRLF platforms the I/O layer handles newlines in text mode so that
the programmer always works with “\n”, no CRLF ever goes up on
Windows. Nor you need to print CRLFs by hand at the Ruby level. At the
Ruby level a newline is always == “\n” and has always length 1.

The string “\n” is the logical newline in Ruby meaning it is portable
and the I/O layer takes care of its actual representation on disk
according to the runtime platform. In Java for example this works in a
different way, “\n” is not portable, to write a portable newline in
Java you invoke some println().

This article explains how newlines work in C-based languages. It is
Perl-based but in general it applies to Ruby except that in Ruby
there’s no platform where “\n” == “\015”. In Ruby “\n” == “\012”
everywhere and that simplifies things a bit. The I/O layer in MRI is
C’s stdio instead of PerlIO, but the explained newline mangling in and
out is analogous:

http://www.onlamp.com/pub/a/onlamp/2006/08/17/understanding-newlines.html

I am the author but that doesn’t matter.

– fxn

On Nov 20, 2007 12:43 AM, Daniel Brumbaugh K.
[email protected] wrote:

I personally rather prefer the current behavior of readline.
But than you could do
readlines/(\n\r?)/,
as default behavior I find it most annoying too.

Robert

On Nov 19, 12:14 pm, “Just Another Victim of the Ambient M.”
[email protected] wrote:

character. On a simply practical note, I’m sure the former is more popular
than the latter in the following:

out = File.open(‘file.txt’, ‘r’){|file| file.readlines.collect{|line|
line.chomp}}
out = File.open(‘file.txt’, ‘r’){|file|
…in that rarely do people actually want newlines in their strings.

FWIW, I never use readlines for this exact reason. I find its
preservation of line endings entirely annoying. I always
IO.read().split when I can.

As much as I’d personally like it changed, and know that such a change
would not affect any of my scripts, I’m concerned that such a change
must fall into the category of “not backwards compatible”, and thus
unlikely to be effected without very strong support.

How do I go about popularizing my opinion?

Discuss the issue here as you are doing. If you don’t get a large
vocal outcry against the proposal, or are not swayed by any arguments
that come against it, file an RCR[1] (preferably with a source code
patch attached) and hope that Matz accepts your change into the core.

[1] http://rcrchive.net/

On Nov 20, 2007 2:53 AM, Xavier N. [email protected] wrote:

the programmer always works with “\n”, no CRLF ever goes up on
Windows. Nor you need to print CRLFs by hand at the Ruby level. At the
Ruby level a newline is always == “\n” and has always length 1.
– fxn

Unfortunately, files created on one platform inevitably make their way
to another. When an IO with \r\n is read on a UNIX, it preserves the
carriage return.

Daniel Brumbaugh K.

On Nov 20, 2007, at 10:17 PM, Daniel Brumbaugh K. wrote:

Unfortunately, files created on one platform inevitably make their way
to another. When an IO with \r\n is read on a UNIX, it preserves the
carriage return.

Yes, that’s covered in the article I mentioned as well:

http://www.onlamp.com/pub/a/onlamp/2006/08/17/understanding-newlines.html?page=3

– fxn