Different results in command-line vs. TextMate

Hi –

I was working on an answer to James R.'s question, and discovered
the following somewhat puzzling (to me) thing.

James’s text file has some non-printing (Word-derived?) characters,
instead of regular spaces:

text = File.read(“lines.txt”)
=>
“Clark\302\240\302\240\302\240\302\240\302\240\302\240Kent\302\240[email protected]\302\240\r\nPop\302\240Eye\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240[email protected]\302\240\r\n”

puts text
Clark      Kent [email protected]Â
Pop Eye              [email protected]

What’s odd is that when I try to scan these lines, I get different
results depending on whether I’m on the command line or in TextMate.
From the command line:

$ cat parse.rb
lines = File.readlines(“lines.txt”)
p RUBY_DESCRIPTION
p lines.map {|line| line.scan(/\w+/) }

$ ruby parse.rb
“ruby 1.8.7 (2008-05-31 patchlevel 0) [i686-darwin9.8.0]”
[[“Clark”, “Kent”, “super”, “fakeplace”, “com”], [“Pop”, “Eye”,
“popeye”, “fakeplace”, “com”]]

And from TextMate, using command-r:

“ruby 1.8.7 (2008-05-31 patchlevel 0) [i686-darwin9.8.0]”
[["Clark Kent ", “super”, “fakeplace”, “com”, " "], ["Pop Eye
", “popeye”, “fakeplace”, “com”, " "]]

As you can see, it’s not just the display that’s different. The scan
operation actually produced different results.

It feels like some kind of Heisenbug but I can’t puzzle it out.

David


David A. Black, Senior Developer, Cyrus Innovation Inc.

The Ruby training with Black/Brown/McAnally
Compleat Philadelphia, PA, October 1-2, 2010
Rubyist http://www.compleatrubyist.com

On Sat, Jul 17, 2010 at 3:03 PM, David A. Black [email protected]
wrote:

James’s text file has some non-printing (Word-derived?) characters,
instead of regular spaces:

Those are nonbreak spaces (U+00A0, 0xC2A0) that should be treated as \W.

What’s odd is that when I try to scan these lines, I get different

results depending on whether I’m on the command line or in TextMate.

I thought the CRLF line endings might have something to do with it, but
the
result was the same. Another clue, with 1.9.1-p378, the result from
TextMate
was correct, identical to that of the command line.

Ammar

Hi –

On Sat, 17 Jul 2010, Ammar A. wrote:

I thought the CRLF line endings might have something to do with it, but the
result was the same. Another clue, with 1.9.1-p378, the result from TextMate
was correct, identical to that of the command line.

Thanks for checking. It turns out to be an encoding thing: TextMate
invokes Ruby with -KU. Without the -KU (which involved editing an
underlying script file, as well as the Bundle Editor entry, but then
again I’m not a bit TextMate bundle expert), it ran the same as the
unadorned command line.

David


David A. Black, Senior Developer, Cyrus Innovation Inc.

The Ruby training with Black/Brown/McAnally
Compleat Philadelphia, PA, October 1-2, 2010
Rubyist http://www.compleatrubyist.com