Forum: Ruby RegExp Problem

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
6ec6f77ea603dd75b3a7a7775b059e79?d=identicon&s=25 John W. Long (Guest)
on 2006-05-24 17:17
(Received via mailing list)
Hi,

I'm having a little trouble getting Ruby to match quotes correctly.
Suppose I wanted to extract any quoted string followed by an exclamation
point. I'd like my regular expression match either single or double
quotes. I thought the following might work:

   re = /(['"])([^\1]*?)\1!/
   md = re.match %{ "this looks like "fun"!" }
   md[0] #=> "\"this looks like \"fun\"!"

But as you can see instead of matching <"fun"!> it matched <"this looks
like "fun"!>, despite the fact that I've told it to match anything but
the character that was used to quote it:

   [^\1]*?

It works beautifully if I tell it to match either single quotes or
double quotes, but I can't write it to match either in a single regular
expression:

   re = /"([^"]*?)"!/
   md = re.match %{ "this looks like "fun"!" }
   md[0] #=> "\"fun\"!"

Or:

   re = /'([^']*?)'!/
   md = re.match %{ 'this looks like 'fun'!' }
   md[0] #=> "'fun'!"

Why does the first regex not do what I want?
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 17:51
(Received via mailing list)
On 5/24/06, John W. Long <ng@johnwlong.com> wrote:
>
>    re = /"([^"]*?)"!/
>
> --
> John Long
> http://wiseheartdesign.com
> http://radiantcms.org

John-

It looks to me like backreferences are not available inside character
classes, because the backslash sequence is interpreted as a character.
So in a character class, \1 is the same as \001, commonly known as
control-A.

irb(main):018:0> re = /(a)([\1]) test/
=> /(a)([\1]) test/
irb(main):019:0> md = re.match "aa test"
=> nil
irb(main):020:0> md = re.match "a test"
=> nil
irb(main):021:0> md = re.match "a1 test"
=> nil
irb(main):022:0> md = re.match "a\1 test"
=> #<MatchData:0x28460e8>
irb(main):023:0> md[0]
=> "a\001 test"
irb(main):024:0> md[1]
=> "a"
irb(main):025:0> md[2]
=> "\001"


-A
6ec6f77ea603dd75b3a7a7775b059e79?d=identicon&s=25 John W. Long (Guest)
on 2006-05-24 18:08
(Received via mailing list)
A LeDonne wrote:
> => nil
> irb(main):021:0> md = re.match "a1 test"
> => nil
> irb(main):022:0> md = re.match "a\1 test"
> => #<MatchData:0x28460e8>
> irb(main):023:0> md[0]
> => "a\001 test"
> irb(main):024:0> md[1]
> => "a"
> irb(main):025:0> md[2]
> => "\001"

That makes perfect sense. So how do I get it to do what I want?
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-05-24 18:47
(Received via mailing list)
On May 24, 2006, at 11:06 AM, John W. Long wrote:

> That makes perfect sense. So how do I get it to do what I want?

 >> %{ "this looks like "fun"!" }[/(?:'[^']*?'|"[^"]*?")!/]
=> "\"fun\"!"
 >> %{ 'this looks like 'fun'!' }[/(?:'[^']*?'|"[^"]*?")!/]
=> "'fun'!"

Hope that helps.

James Edward Gray II
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 18:47
(Received via mailing list)
On 5/24/06, John W. Long <ng@johnwlong.com> wrote:
> That makes perfect sense. So how do I get it to do what I want?
>
> --
> John Long
> http://wiseheartdesign.com
> http://radiantcms.org

Umm.... backwards?

irb(main):001:0> re = /!(['"]).*?\1/
=> /!(['"]).*?\1/
irb(main):002:0> fun = %{ "this looks like "fun"!" }
=> " \"this looks like \"fun\"!\" "
irb(main):003:0> md = re.match fun.reverse
=> #<MatchData:0x2820660>
irb(main):004:0> md[0].reverse
=> "\"fun\"!"

This way, instead of trying to negate a character class, you're just
doing a non-greedy match from the anchored exclamation point - quote
mark combo to the first matching quote mark.

-A
6ec6f77ea603dd75b3a7a7775b059e79?d=identicon&s=25 John W. Long (Guest)
on 2006-05-24 18:53
(Received via mailing list)
John W. Long wrote:
> That makes perfect sense. So how do I get it to do what I want?

I have something that works now:

   re = /(?:"[^"]*?"|'[^']*?')!/
   md = re.match %{ "this looks like "fun"!" }
   md[0] #=> "\"fun\"!"
   md = re.match %{ 'this looks like 'fun'!' }
   md[0] => "'fun'!"

Still, it makes me wonder if it's possible to do it with back
references.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-05-24 19:28
(Received via mailing list)
2006/5/24, John W. Long <ng@johnwlong.com>:
>
> Still, it makes me wonder if it's possible to do it with back references.

I don't think you can have backreference in character class.  In this
case it's fairly easy.  This is what I'd do

re = %r{
  (?:
    '[^']+' |
    "[^"]+"
  )!
}xi

Basically the same as what you did.  But you do not need the
reluctanct quantifiers because the negated char class prevents longer
matches anyway.  I'm not sure whether there is a performance
difference.

Kind regards

robert
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 20:22
(Received via mailing list)
> Still, it makes me wonder if it's possible to do it with back references.
>

I haven't played with Oniguruma yet, but it has named groups - maybe a
named backreference can be used in an Oniguruma character class, as \k
is unambiguous...

Anyone able to test?

-A
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 20:55
(Received via mailing list)
On 5/24/06, John W. Long <ng@johnwlong.com> wrote:
>
> Still, it makes me wonder if it's possible to do it with back references.

OK, one more thought. Do you necessarily need it in md[0]? If not...

re = /(['"])(?:.*\1)*(.*\1!)/
md = re.match %{ "this looks like "fun"!" }
p md[1]<<md[2] #=> "\"fun\"!"
md = re.match %{ 'this looks like 'fun'!' }
p md[1]<<md[2] #=> "'fun'!"

-A
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 21:04
(Received via mailing list)
On 5/24/06, A LeDonne <aledonne.listmail@gmail.com> wrote:
>
> OK, one more thought. Do you necessarily need it in md[0]? If not...
>
> re = /(['"])(?:.*\1)*(.*\1!)/
> md = re.match %{ "this looks like "fun"!" }
> p md[1]<<md[2] #=> "\"fun\"!"
> md = re.match %{ 'this looks like 'fun'!' }
> p md[1]<<md[2] #=> "'fun'!"
>
> -A

Better:  re = /(['"])(?:.*\1)(.*\1!)/

-A
6ec6f77ea603dd75b3a7a7775b059e79?d=identicon&s=25 John W. Long (Guest)
on 2006-05-24 21:20
(Received via mailing list)
A LeDonne wrote:
>> -A
>
> Better:  re = /(['"])(?:.*\1)(.*\1!)/

But the point is to match any quoted expression followed by an
exclamation point. The string:

   %{ "this looks like "fun"!" }

Is only to demonstrate an expression that I was having trouble greping.
Your expression would require a quote and then a quoted expression
followed by an exclamation point--not exactly what I was looking for.
93d566cc26b230c553c197c4cd8ac6e4?d=identicon&s=25 Pit Capitain (Guest)
on 2006-05-24 21:42
(Received via mailing list)
John W. Long schrieb:
> Still, it makes me wonder if it's possible to do it with back references.

John, you can use negative lookahead: /(['"])((?!\1).)*\1!/

Regards,
Pit
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2006-05-24 21:45
(Received via mailing list)
On 5/24/06, A LeDonne <aledonne.listmail@gmail.com> wrote:
>
> doing a non-greedy match from the anchored exclamation point - quote
> mark combo to the first matching quote mark.


I do not know if this is a performance killer (probably not ) but
frankly I
do not care, this is one of the most original ideas I have ever seen on
this
list.
Just wanted to say this!
Really nice!

Robert

P.S.
It was yours was it not? ;)
R

-A
>
>


--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 23:16
John W. Long wrote:
> A LeDonne wrote:
>>> -A
>>
>> Better:  re = /(['"])(?:.*\1)(.*\1!)/
>
> But the point is to match any quoted expression followed by an
> exclamation point. The string:
>
>    %{ "this looks like "fun"!" }
>
> Is only to demonstrate an expression that I was having trouble greping.
> Your expression would require a quote and then a quoted expression
> followed by an exclamation point--not exactly what I was looking for.

Ignore my "better". what I had the first time,

re = /(['"])(?:.*\1)*(.*\1!)/

was actually correct. That way, it requires zero or more intervening
matching quoty things. That's what I get for not writing unit tests
first...

-A
05e48e632fdd0b2c25d27042f52c11d5?d=identicon&s=25 A LeDonne (Guest)
on 2006-05-24 23:19
Robert Dober wrote:
> On 5/24/06, A LeDonne <aledonne.listmail@gmail.com> wrote:
>>
>> doing a non-greedy match from the anchored exclamation point - quote
>> mark combo to the first matching quote mark.
>
>
> I do not know if this is a performance killer (probably not ) but
> frankly I
> do not care, this is one of the most original ideas I have ever seen on
> this
> list.
> Just wanted to say this!
> Really nice!
>
> Robert
>
> P.S.
> It was yours was it not? ;)
> R

Thank you... I'm flattered!

Yes, it's mine. ;)

-A
134422ecf72052ea7734ec8b5dc1f300?d=identicon&s=25 M.D. (Guest)
on 2006-05-25 00:26
John W. Long wrote:
> A LeDonne wrote:
>>> -A
>>
>> Better:  re = /(['"])(?:.*\1)(.*\1!)/
>
> But the point is to match any quoted expression followed by an
> exclamation point. The string:
>
>    %{ "this looks like "fun"!" }
>
> Is only to demonstrate an expression that I was having trouble greping.
> Your expression would require a quote and then a quoted expression
> followed by an exclamation point--not exactly what I was looking for.



John,

I wonder if this also does the job:

   re = /['"][\w\s]*\W*['"]!/
   md = re.match %{ "this looks like ' a fun test !? '!" }
   puts md[0]

M.D.
Bd0203dc8478deb969d72f52e741bd4f?d=identicon&s=25 Daniel Baird (Guest)
on 2006-05-25 02:17
(Received via mailing list)
On 5/25/06, John W. Long <ng@johnwlong.com> wrote:
>
>
> ..Suppose I wanted to extract any quoted string followed by an exclamation
> point..


Is this for a sarcasm detector?

;D
6ec6f77ea603dd75b3a7a7775b059e79?d=identicon&s=25 John W. Long (Guest)
on 2006-05-25 04:01
(Received via mailing list)
Daniel Baird wrote:
> On 5/25/06, John W. Long <ng@johnwlong.com> wrote:
>>
>>
>> ..Suppose I wanted to extract any quoted string followed by an
>> exclamation
>> point..
>
>
> Is this for a sarcasm detector?

LOL :-)

Actually my real problem was much more complex (matching quotes on HTML
like tags), but this demonstrated the same problem.
This topic is locked and can not be replied to.