Characters and strings oddness


#1

Hi there Rubyists -

I’m trying to learn the language, coming from a long background in Perl.

Here’s what I want to do: pick out a double-quoted string inside
another string,
respecting embedded, backslashed quotes:

line = ’ a string “properly “quoted” that ends” here ’
quoted = ‘“properly “quoted” that ends”’

If Ruby had the same regexes as Perl, I’d say something like

line.gsub!(/^\s*(".?(?<!\)")\s/, ‘’)

and have my quoted string pop out in $1. In fact, TextMate groks that
regex, too.
Ruby don’t like the negative look-behind, unfortunately.

Ok, let’s do a loop then. I finally arrived at:

string = “”"
if line.gsub!(/^.*?(")/, ‘’)
(0…line.length).each do |i|
string << line[i]
break if i>0 && (line[i, 1] == “”") && (line[i-1, 1] != ‘\’)
end
end

which works. However… I’m puzzled by Ruby’s way of handling strings. A
string
is - essentially - a set of bytes, not unlike a char[] in C. Is there
really no way
of defining character literals in Ruby? I was surprised to find that I
couldn’t say

Stefan-Krugers-Computer:~ stefan$ irb
irb(main):001:0> string = ‘this is a string’
=> “this is a string”
irb(main):002:0> string[0] == ‘t’
=> false

whereas i can say

irb(main):003:0> string[0, 1] == ‘t’
=> true

Now, in my little loop experiment above I tried the following:

delim = “”"
string = delim
if line.gsub!(/^.*?(")/, ‘’)
(0…line.length).each do |i|
string << line[i]
break if i>0 && (line[i, 1] == delim) && (line[i-1, 1] != ‘\’)
end
end

TypeError: can’t convert nil into String

method << in test.rb at line 9
at top level in test.rb at line 9
at top level in test.rb at line 8

I’m at a loss to understand why that gives an error.


#2

On Jun 14, 2007, at 11:01 AM, Stefan Kruger wrote:

Hi there Rubyists -
Hi.

irb(main):002:0> string[0] == ‘t’
string[0].chr == ‘t’
or
string[0] == ?t

method << in test.rb at line 9
at top level in test.rb at line 9
at top level in test.rb at line 8

I’m at a loss to understand why that gives an error.
Try 0…line.length (three dots, not two). You’re indexing past the
end of the string.


#3

On Jun 14, 12:27 pm, Zachary H. removed_email_address@domain.invalid wrote:

On Jun 14, 2007, at 11:01 AM, Stefan Kruger wrote:

irb(main):002:0> string[0] == ‘t’

string[0].chr == ‘t’
or
string[0] == ?t

or string[0…0] == ‘t’

It’s a common gotcha that indexing a string by a single integer
returns the (integer) CODE of the character/byte at that location, not
the one-character string containing that byte.


#4

Stefan Kruger wrote:

line = ’ a string “properly “quoted” that ends” here ’
quoted = ‘“properly “quoted” that ends”’

If Ruby had the same regexes as Perl, I’d say something like

line.gsub!(/^\s*(".?(?<!\)")\s/, ‘’)

and have my quoted string pop out in $1. In fact, TextMate groks that
regex, too.

How is that supposed to work? With the initial anchor there’s no way
that regexp would ever match even if ruby supported look-behind. I
suggest this: /"((?:\.|[^\])+)"/

Daniel