(… that subject probably makes no sense …)
Anyway, I have some unexpected (to me) behavior in the following regexp.
This example is contrived, but based on a real need. Can anyone explain
why
the result is multi-line, even though the re is not?
require ‘test/unit’
class TestRE < Test::Unit::TestCase
def test_newlines
src = “happy\n\nbirthday”
assert_equal(“hday”, src.scan(/h[^x]*?day/).to_s)
end
end
produces
Finished in 0.031 seconds.
- Failure:
test_newlines_consumed_in_not_section(TestRE) …
<“hday”> expected but was
<“happy\n\nbirthday”>.
Adding \n inside the brackets fixes it, I just wouldn’t expect to have
to do
this since I didn’t add the multiline mode option.
require ‘test/unit’
class TestRE < Test::Unit::TestCase
def test_newlines
src = “happy\n\nbirthday”
assert_equal(“hday”, src.scan(/h[^x\n]*?day/).to_s)
end
end
Chris M. wrote:
(… that subject probably makes no sense …)
Anyway, I have some unexpected (to me) behavior in the following regexp.
This example is contrived, but based on a real need. Can anyone explain
why
the result is multi-line, even though the re is not?
require ‘test/unit’
class TestRE < Test::Unit::TestCase
def test_newlines
src = “happy\n\nbirthday”
assert_equal(“hday”, src.scan(/h[^x]*?day/).to_s)
end
end
produces
Finished in 0.031 seconds.
- Failure:
test_newlines_consumed_in_not_section(TestRE) …
<“hday”> expected but was
<“happy\n\nbirthday”>.
Can anyone explain why
the result is multi-line, even though the re is not?
It’s not a question of the re being multi-line or not, it’s a question
of the re being greedy v. non-greedy. But because there is only one
match for your regex, the issue of greedy v. non-greedy is irrelevant.
If you think about it, there is really no concept of ‘lines’ with
regards to text. There really is only one line–one, long, continuous
line of characters. Some of those characters might be ‘\n’ characters,
and we may choose to interpret a ‘\n’ as a new line, but that doesn’t
change the fact that there is still just one continuous string of
characters. A regex has nothing inherently programmed into it that will
cause it to stop looking for matches when a ‘\n’ is encountered in the
sequence of characters. The regex character ‘.’ will stop searching
at a newline, but that is not true of regex’s generally. In any case,
you do not use the ‘.’ character in your regex, so that behavior is
irrelevant.
On 10/26/07, Chris M. [email protected] wrote:
end
There’s also something I don’t understand, similar to the above.
I always thought that in a non-multiline regexp, the dot didn’t match
newlines (\n), so I don’t understand this:
irb(main):036:0> re = /(h)(.)(day)/
=> /(h)(.)(day)/
irb(main):037:0> “happy\n\nbirthday”.match(re).captures
=> [“h”, “”, “day”]
irb(main):038:0> re = /(h)(.)(day)/m
=> /(h)(.)(day)/m
irb(main):039:0> “happy\n\nbirthday”.match(re).captures
=> [“h”, “appy\n\nbirth”, “day”]
I thought the first case wouldn’t match.
Can anyone shed some light?
Jesus.
On 10/26/07, 7stud – [email protected] wrote:
The regex character ‘.’ will stop searching
at a newline, but that is not true of regex’s generally. In any case,
you do not use the ‘.’ character in your regex, so that behavior is
irrelevant.
Can you check my example above? I’m using a greedy match of .* which I
thought would match up to a \n in a non-multiline regexp, and would
include everything in a multiline one. I must be confused at some
point
Jesus.
On 10/27/07, Phrogz [email protected] wrote:
=> [“h”, “”, “day”]
h.+day/, which does not match.
I need more sleep, for sure. I was of course thinking on the first “h”
and the last “day”. That explains it
irb(main):043:0> “happy\n\nday”.match(re).captures
NoMethodError: undefined method `captures’ for nil:NilClass
Thanks,
Jesus.
On Oct 26, 3:43 pm, “Jesús Gabriel y Galán” [email protected]
wrote:
irb(main):039:0> “happy\n\nbirthday”.match(re).captures
=> [“h”, “appy\n\nbirth”, “day”]
I thought the first case wouldn’t match.
Can anyone shed some light?
The last four characters of the word “birthday” match the regexp /
h.*day/, without crossing any newlines. Perhaps you were thinking of /
h.+day/, which does not match.
On Oct 26, 2007, at 7:23 PM, 7stud – wrote:
rb(main):036:0> re = /(h)(.)(day)/
=> /(h)(.)(day)/
irb(main):037:0> “happy\n\nbirthday”.match(re).captures
=> [“h”, “”, “day”]
The fact that the (.*) matched nothing was an indication that
something
was amiss.
Nothing amiss there at all. the * is match “zero or more times” and
so it is perfectly fine to match zero occurrences of any character
(except newline) between the ‘h’ and the ‘day’
-Rob
Rob B. http://agileconsultingllc.com
[email protected]
Jesús Gabriel y Galán wrote:
On 10/27/07, Phrogz [email protected] wrote:
=> [“h”, “”, “day”]
h.+day/, which does not match.
I need more sleep, for sure. I was of course thinking on the first “h”
and the last “day”. That explains it
A clue was in the capture results:
rb(main):036:0> re = /(h)(.)(day)/
=> /(h)(.)(day)/
irb(main):037:0> “happy\n\nbirthday”.match(re).captures
=> [“h”, “”, “day”]
The fact that the (.*) matched nothing was an indication that something
was amiss.
On Oct 26, 2007, at 3:30 PM, Chris M. wrote:
end
end
from memory, ‘multiline’ affects only the behavior of ‘.’ in res
the re
[^x] => ‘not x’
simply matches any char that is not ‘x’ - including newline
it’s the same in perl and python iirc
cheers.
a @ http://codeforpeople.com/
Jesús Gabriel y Galán wrote:
I was of course thinking on the first “h”
and the last “day”.
Rob B. wrote:
Nothing amiss there at all.
Ok.
On 10/26/07, ara.t.howard [email protected] wrote:
from memory, ‘multiline’ affects only the behavior of ‘.’ in res
the re
[^x] => ‘not x’
simply matches any char that is not ‘x’ - including newline
it’s the same in perl and python iirc
Yeah, it behaves that way. I guess I need to adjust my expectations