Hi,
I’m having a little trouble getting Ruby to match quotes correctly.
Suppose I wanted to extract any quoted string followed by an exclamation
point. I’d like my regular expression match either single or double
quotes. I thought the following might work:
re = /([’"])([^\1]*?)\1!/
md = re.match %{ “this looks like “fun”!” }
md[0] #=> "“this looks like “fun”!”
But as you can see instead of matching <“fun”!> it matched <"this looks
like “fun”!>, despite the fact that I’ve told it to match anything but
the character that was used to quote it:
[^\1]*?
It works beautifully if I tell it to match either single quotes or
double quotes, but I can’t write it to match either in a single regular
expression:
re = /"([^"]*?)"!/
md = re.match %{ “this looks like “fun”!” }
md[0] #=> ““fun”!”
Or:
re = /’([^’]*?)’!/
md = re.match %{ ‘this looks like ‘fun’!’ }
md[0] #=> “‘fun’!”
Why does the first regex not do what I want?
On 5/24/06, John W. Long [email protected] wrote:
re = /“([^”]*?)"!/
–
John L.
http://wiseheartdesign.com
http://radiantcms.org
John-
It looks to me like backreferences are not available inside character
classes, because the backslash sequence is interpreted as a character.
So in a character class, \1 is the same as \001, commonly known as
control-A.
irb(main):018:0> re = /(a)([\1]) test/
=> /(a)([\1]) test/
irb(main):019:0> md = re.match “aa test”
=> nil
irb(main):020:0> md = re.match “a test”
=> nil
irb(main):021:0> md = re.match “a1 test”
=> nil
irb(main):022:0> md = re.match “a\1 test”
=> #MatchData:0x28460e8
irb(main):023:0> md[0]
=> “a\001 test”
irb(main):024:0> md[1]
=> “a”
irb(main):025:0> md[2]
=> “\001”
-A
A LeDonne wrote:
=> nil
irb(main):021:0> md = re.match “a1 test”
=> nil
irb(main):022:0> md = re.match “a\1 test”
=> #MatchData:0x28460e8
irb(main):023:0> md[0]
=> “a\001 test”
irb(main):024:0> md[1]
=> “a”
irb(main):025:0> md[2]
=> “\001”
That makes perfect sense. So how do I get it to do what I want?
On 5/24/06, John W. Long [email protected] wrote:
That makes perfect sense. So how do I get it to do what I want?
–
John L.
http://wiseheartdesign.com
http://radiantcms.org
Umm… backwards?
irb(main):001:0> re = /!(['"]).?\1/
=> /!(['"]).?\1/
irb(main):002:0> fun = %{ “this looks like “fun”!” }
=> " "this looks like "fun"!" "
irb(main):003:0> md = re.match fun.reverse
=> #MatchData:0x2820660
irb(main):004:0> md[0].reverse
=> “"fun"!”
This way, instead of trying to negate a character class, you’re just
doing a non-greedy match from the anchored exclamation point - quote
mark combo to the first matching quote mark.
-A
On May 24, 2006, at 11:06 AM, John W. Long wrote:
That makes perfect sense. So how do I get it to do what I want?
%{ “this looks like “fun”!” }[/(?:’[^’]?’|"[^"]?")!/]
=> ““fun”!”
%{ ‘this looks like ‘fun’!’ }[/(?:’[^’]?’|"[^"]?")!/]
=> “‘fun’!”
Hope that helps.
James Edward G. II
John W. Long wrote:
That makes perfect sense. So how do I get it to do what I want?
I have something that works now:
re = /(?:"[^"]?"|’[^’]?’)!/
md = re.match %{ “this looks like “fun”!” }
md[0] #=> ““fun”!”
md = re.match %{ ‘this looks like ‘fun’!’ }
md[0] => “‘fun’!”
Still, it makes me wonder if it’s possible to do it with back
references.
Still, it makes me wonder if it’s possible to do it with back references.
I haven’t played with Oniguruma yet, but it has named groups - maybe a
named backreference can be used in an Oniguruma character class, as \k
is unambiguous…
Anyone able to test?
-A
2006/5/24, John W. Long [email protected]:
Still, it makes me wonder if it’s possible to do it with back references.
I don’t think you can have backreference in character class. In this
case it’s fairly easy. This is what I’d do
re = %r{
(?:
‘[^’]+’ |
“[^”]+"
)!
}xi
Basically the same as what you did. But you do not need the
reluctanct quantifiers because the negated char class prevents longer
matches anyway. I’m not sure whether there is a performance
difference.
Kind regards
robert
On 5/24/06, A LeDonne [email protected] wrote:
OK, one more thought. Do you necessarily need it in md[0]? If not…
re = /(['"])(?:.\1)(.*\1!)/
md = re.match %{ “this looks like “fun”!” }
p md[1]<<md[2] #=> “"fun"!”
md = re.match %{ ‘this looks like ‘fun’!’ }
p md[1]<<md[2] #=> “‘fun’!”
-A
Better: re = /(['"])(?:.\1)(.\1!)/
-A
A LeDonne wrote:
-A
Better: re = /([’"])(?:.\1)(.\1!)/
But the point is to match any quoted expression followed by an
exclamation point. The string:
%{ “this looks like “fun”!” }
Is only to demonstrate an expression that I was having trouble greping.
Your expression would require a quote and then a quoted expression
followed by an exclamation point–not exactly what I was looking for.
On 5/24/06, John W. Long [email protected] wrote:
Still, it makes me wonder if it’s possible to do it with back references.
OK, one more thought. Do you necessarily need it in md[0]? If not…
re = /(['"])(?:.\1)(.*\1!)/
md = re.match %{ “this looks like “fun”!” }
p md[1]<<md[2] #=> “"fun"!”
md = re.match %{ ‘this looks like ‘fun’!’ }
p md[1]<<md[2] #=> “‘fun’!”
-A
On 5/24/06, A LeDonne [email protected] wrote:
doing a non-greedy match from the anchored exclamation point - quote
mark combo to the first matching quote mark.
I do not know if this is a performance killer (probably not ) but
frankly I
do not care, this is one of the most original ideas I have ever seen on
this
list.
Just wanted to say this!
Really nice!
Robert
P.S.
It was yours was it not?
R
-A
–
Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.
John W. Long wrote:
A LeDonne wrote:
-A
Better: re = /([’"])(?:.\1)(.\1!)/
But the point is to match any quoted expression followed by an
exclamation point. The string:
%{ “this looks like “fun”!” }
Is only to demonstrate an expression that I was having trouble greping.
Your expression would require a quote and then a quoted expression
followed by an exclamation point–not exactly what I was looking for.
Ignore my “better”. what I had the first time,
re = /([’"])(?:.\1)(.*\1!)/
was actually correct. That way, it requires zero or more intervening
matching quoty things. That’s what I get for not writing unit tests
first…
-A
Robert D. wrote:
On 5/24/06, A LeDonne [email protected] wrote:
doing a non-greedy match from the anchored exclamation point - quote
mark combo to the first matching quote mark.
I do not know if this is a performance killer (probably not ) but
frankly I
do not care, this is one of the most original ideas I have ever seen on
this
list.
Just wanted to say this!
Really nice!
Robert
P.S.
It was yours was it not?
R
Thank you… I’m flattered!
Yes, it’s mine.
-A
John W. Long schrieb:
Still, it makes me wonder if it’s possible to do it with back references.
John, you can use negative lookahead: /([’"])((?!\1).)*\1!/
Regards,
Pit
John W. Long wrote:
A LeDonne wrote:
-A
Better: re = /([’"])(?:.\1)(.\1!)/
But the point is to match any quoted expression followed by an
exclamation point. The string:
%{ “this looks like “fun”!” }
Is only to demonstrate an expression that I was having trouble greping.
Your expression would require a quote and then a quoted expression
followed by an exclamation point–not exactly what I was looking for.
John,
I wonder if this also does the job:
re = /[’"][\w\s]\W[’"]!/
md = re.match %{ “this looks like ’ a fun test !? '!” }
puts md[0]
M.D.
On 5/25/06, John W. Long [email protected] wrote:
…Suppose I wanted to extract any quoted string followed by an exclamation
point…
Is this for a sarcasm detector?
;D
Daniel B. wrote:
On 5/25/06, John W. Long [email protected] wrote:
…Suppose I wanted to extract any quoted string followed by an
exclamation
point…
Is this for a sarcasm detector?
LOL
Actually my real problem was much more complex (matching quotes on HTML
like tags), but this demonstrated the same problem.