1.8.7 String#lines keeps new-line chars (say it ain't so in 1.9)

Ruby 1.8.7 p72

“A\nB\nC”.lines.to_a
=> [“A\n”, “B\n”, “C”]

Please, tell me that’s a mishap, and not how 1.9 works. I’d expect:

“A\nB\nC”.lines.to_a
=> [“A”, “B”, “C”]

Thanks.

Thomas S. wrote:

Ruby 1.8.7 p72

“A\nB\nC”.lines.to_a
=> [“A\n”, “B\n”, “C”]

Please, tell me that’s a mishap, and not how 1.9 works. I’d expect:

“A\nB\nC”.lines.to_a
=> [“A”, “B”, “C”]

Why would you expect that? The documentation is very clear.

--------------------------------------------------------------- IO#lines
ios.lines(sep=$/) => anEnumerator
ios.lines(limit) => anEnumerator
ios.lines(sep, limit) => anEnumerator

 Returns an enumerator that gives each line in _ios_. The stream
 must be opened for reading or an +IOError+ will be raised.

    f = File.new("testfile")
    f.lines.to_a  #=> ["foo\n", "bar\n"]
    f.rewind
    f.lines.sort  #=> ["bar\n", "foo\n"]

If it changed in 1.9, that would be another source of incompatibilities.

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

Thomas S. wrote:

Ruby 1.8.7 p72

“A\nB\nC”.lines.to_a
=> [“A\n”, “B\n”, “C”]

Please, tell me that’s a mishap, and not how 1.9 works. I’d expect:

“A\nB\nC”.lines.to_a
=> [“A”, “B”, “C”]

Thanks.

$ ruby19 r1test.rb
[“A\n”, “B\n”, “C\n”]

$ ri19 String#lines
----------------------------------------------------------- String#lines
str.lines(separator=$/) => anEnumerator
str.lines(separator=$/) {|substr| block } => str

 From Ruby 1.9.1

 Returns an enumerator that gives each line in the string. If a
 block is given, it iterates over each line in the string.

    "foo\nbar\n".lines.to_a   #=> ["foo\n", "bar\n"]
    "foo\nb ar".lines.sort    #=> ["b ar", "foo\n"]

…oh, yeah:

$ ruby19 -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin8.11.1]

On Sat, Aug 22, 2009 at 11:46 AM, Intransition[email protected]
wrote:

Ruby 1.8.7 p72

“A\nB\nC”.lines.to_a
=> [“A\n”, “B\n”, “C”]

Please, tell me that’s a mishap, and not how 1.9 works. I’d expect:

“A\nB\nC”.lines.to_a
=> [“A”, “B”, “C”]

I know I’m going to be accused of bullying you again, but…
Install the latest 1.9.1 and try it yourself (or if you feel fancy,
the 1.9.2 preview).

These aren’t questions irb can’t answer for you.

Alternatively, try out ruby-versions:
http://ruby-versions.net/

At 9:58 PM +0900 8/23/09, David A. Black wrote:

  • codepoints

There’s no auto-chomping, but there never has been in any string
operation I can think of.

It isn’t the same but in many places where I might use String#lines
I’d use code like this in Ruby 1.8.6:

“first line\nsecond line\nthird line”.split("\n")
=> [“first line”, “second line”, “third line”]

Hi –

On Sun, 23 Aug 2009, Intransition wrote:

Ruby 1.8.7 p72

“A\nB\nC”.lines.to_a
=> [“A\n”, “B\n”, “C”]

Please, tell me that’s a mishap, and not how 1.9 works. I’d expect:

“A\nB\nC”.lines.to_a
=> [“A”, “B”, “C”]

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

  • lines
  • bytes
  • chars
  • codepoints

There’s no auto-chomping, but there never has been in any string
operation I can think of.

David

On Aug 23, 3:32 am, Brian C. [email protected] wrote:

    f = File.new("testfile")
    f.lines.to_a  #=> ["foo\n", "bar\n"]
    f.rewind
    f.lines.sort  #=> ["bar\n", "foo\n"]

If it changed in 1.9, that would be another source of incompatibilities.

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

I’d expect it from a StringIO, but not a String.

T.

On Sat, Aug 22, 2009 at 5:46 PM, Intransition[email protected]
wrote:

Thanks.

$ ruby -v -e ‘p “A\nB\nC”.lines.to_a’
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
[“A\n”, “B\n”, “C”]

$ ruby-trunk -v -e ‘p “A\nB\nC”.lines.to_a’
ruby 1.9.2dev (2009-08-23 trunk 24631) [x86_64-linux]
[“A\n”, “B\n”, “C”]


Pozdrawiam

Radosław Bułat
http://radarek.jogger.pl - mój blog

On Aug 23, 3:32 am, Brian C. [email protected] wrote:

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

How is there loss of data, when you know what was removed? Just join
(“\n”).

On Aug 23, 10:05 am, Stephen B. [email protected]
wrote:

It isn’t the same but in many places where I might use String#lines
I’d use code like this in Ruby 1.8.6:

“first line\nsecond line\nthird line”.split(“\n”)
=> [“first line”, “second line”, “third line”]

Exactly. And I guess I’ll just have to keep on doing that then.

On Aug 23, 2009, at 3:48 PM, Trans wrote:

On Aug 23, 3:32 am, Brian C. [email protected] wrote:

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

How is there loss of data, when you know what was removed? Just join
(“\n”).

What do we do for lines ending in \r\n? Do we take both or just the
\n? I say both would be the most consistent, but then you don’t know
if you need to put back a \r\n or just an \n.

Also, how do you know if the last line ended in a \n? join(“\n”)
wouldn’t put it back in either case.

James Edward G. II

On Aug 23, 8:58 am, “David A. Black” [email protected] wrote:

operation I can think of.
String#lines wasn’t defined in 1.8.6 so I did not think there was any
precedence for it. My use case has always been (as Radoslaw said):

“first line\nsecond line\nthird line”.split(“\n”)

Wanting my program to read better, I have often defined #lines to do
just that. In my experience that’s the frequent case. Wanting to keep
the separator I think is the lesser need --for which I would be
happier with a less concise method name. As it stands #lines does me
no good now.

“first line\nsecond line\nthird line”.lines.map{ |s| s.chomp(“\n”) }

Is even worse than before! :wink:

Thomas S. wrote:

On Aug 23, 3:32�am, Brian C. [email protected] wrote:

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

How is there loss of data, when you know what was removed? Just join
(“\n”).

“Loss of data” means “you don’t get back exactly what you started with”.
Taking your example, I believe that you want both “A\nB\nC” and
“A\nB\nC\n” to result in lines [“A”,“B”,“C”], so this operation is not
reversible.

Or did you want “A\nB\nC\n” to result in [“A”,“B”,“C”,“”] ? That would
surprise me more. Most inputs have terminating newlines on the final
line.

On Aug 23, 8:50 am, Gregory B. [email protected] wrote:

I know I’m going to be accused of bullying you again, but…
Install the latest 1.9.1 and try it yourself (or if you feel fancy,
the 1.9.2 preview).

These aren’t questions irb can’t answer for you.

Looking it up isn’t the main issue mate. It was the “wherefore?” that
I pondered upon finding it to be the case.

Alternatively, try out ruby-versions:http://ruby-versions.net/

Cool, thanks.

On Aug 24, 4:50 am, Brian C. [email protected] wrote:

“Loss of data” means “you don’t get back exactly what you started with”.
Taking your example, I believe that you want both “A\nB\nC” and
“A\nB\nC\n” to result in lines [“A”,“B”,“C”], so this operation is not
reversible.

Or did you want “A\nB\nC\n” to result in [“A”,“B”,“C”,“”] ? That would
surprise me more. Most inputs have terminating newlines on the final
line.

Granting that prefect reversibility is a requirement here, then yes
the later makes sense. I do not think it surprising.

To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

“show it\nto me”.words => ["show ", “it\n”, "to ", "me "]

I think the broader issue here is the question of whether or not
String is intended for use by code-point (ie. low-level character)
manipulators, or for higher-level human-oriented textual manipulation.
I always thought StringIO was for the former case. But now I am seeing
Ruby’s String is some sort of hodge-podge mixture of the two.

T.

Thomas S. wrote:

To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

“show it\nto me”.words => ["show ", “it\n”, "to ", "me "]

…except there is no built-in method ‘words’ so you can’t accuse it of
being inconsistent :slight_smile:

Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don’t think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

I think the broader issue here is the question of whether or not
String is intended for use by code-point (ie. low-level character)
manipulators, or for higher-level human-oriented textual manipulation.
I always thought StringIO was for the former case. But now I am seeing
Ruby’s String is some sort of hodge-podge mixture of the two.

I think StringIO is for when you want to duck-type a File, but with
in-RAM backing.

ruby is certainly lacking consistency in this area. In ruby 1.9, for
example, IO still has #each (meaning #each_line), whereas String doesn’t
any more.

2009/8/24 Trans [email protected]:

being inconsistent :slight_smile:
usecases. I believe the short, more concise method name should go to
the most frequent use. I have no definitive statistics, but I’d wager
that split(“\n”) is by far the more common case. Based on that, I’d
rather see the current def be called something else, like #newlines or
#rawlines. But Ruby is ultimately Matz’ baby so maybe his more common
use is otherwise.

I don’t think it is a good idea to change the default behavior. If
you frequently need line endings stripped, you can always define your
own method for this, for example:

class LineEnum
include Enumerable

def initialize(obj, meth = case obj
when String, IO then :each_line
else :each
end)
@obj = obj
@meth = meth
end

def each
@obj.send(@meth) do |elem|
elem.chomp!
yield elem
end

self

end
end

$ irb19 -r lineenum.rb
Ruby version 1.9.1
irb(main):001:0> s = “foo\nbar\n”
=> “foo\nbar\n”
irb(main):002:0> se = LineEnum.new s
=> #<LineEnum:0x10169bc0 @obj=“foo\nbar\n”, @meth=:each_line>
irb(main):003:0> se.each {|l| p l}
“foo”
“bar”
=> #<LineEnum:0x10169bc0 @obj=“foo\nbar\n”, @meth=:each_line>
irb(main):004:0>

We could also extend Enumerator to honor blocks passed to them so we
could do

$stdin.to_enum(:each_line) {|l| l.strip!}.each do |line|
p line # no \n present
end

But frankly, I’d rather just add a “line.chomp!” to my block body and
be done. :slight_smile:

Kind regards

robert

On Aug 24, 9:17 am, Brian C. [email protected] wrote:

Thomas S. wrote:

To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

“show it\nto me”.words => ["show ", “it\n”, "to ", "me "]

…except there is no built-in method ‘words’ so you can’t accuse it of
being inconsistent :slight_smile:

ok :wink: …just making an analogy.

Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don’t think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

Sure, but at that point we are moving into a realm of narrower
usecases. I believe the short, more concise method name should go to
the most frequent use. I have no definitive statistics, but I’d wager
that split(“\n”) is by far the more common case. Based on that, I’d
rather see the current def be called something else, like #newlines or
#rawlines. But Ruby is ultimately Matz’ baby so maybe his more common
use is otherwise.

example, IO still has #each (meaning #each_line), whereas String doesn’t
any more.

Yea, I think that b/c StringIO is an IO first and foremost. So I don’t
think String should aspire to be like StringIO per se. And StringIO
can only be like String insofar as it doesn’t interfere with it being
an IO. By I may be presuming too much.

Appreciate the discussion.