Forum: Ruby regexp questions

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ce5990549a8185ad45643be2cfbff9b5?d=identicon&s=25 Mike Steiner (Guest)
on 2007-06-24 00:23
(Received via mailing list)
I'm converting a big Python program to Ruby which uses lots of regexps,
and
I'm getting some odd errors. One problem seems to be that \1 in the
replacement string doesn't always work. Are there any known "gotchas"
between Python's regexps and Ruby's?

And what's the most robust way to convert "underlined" words to HTML
italics
using a regexp? Something like: "This _word_ is in italics." -> "This
<i>word</i> is in italics."

Thanks!

Mike Steiner
0158871402c1ecfa57952e8a379cfd10?d=identicon&s=25 Daniel Lucraft (lucraft)
on 2007-06-24 10:45
Mike Steiner wrote:
> I'm getting some odd errors. One problem seems to be that \1 in the
> replacement string doesn't always work. Are there any known "gotchas"
> between Python's regexps and Ruby's?
>
> And what's the most robust way to convert "underlined" words to HTML
> italics
> using a regexp? Something like: "This _word_ is in italics." -> "This
> <i>word</i> is in italics."

Could you give an example of where it isn't working in the first case?

As for the second, I don't know about 'most robust' but I might try
something like this (with the assumption that words aren't broken over
lines):

irb> str
=> "asdf asdf _asdfasd_ asdf _ash_ h"
irb> str.gsub(/_([^\s]+)_/, "<i>\\1</i>")
=> "asdf asdf <i>asdfasd</i> asdf <i>ash</i> h"

Or if I wanted to be able to have italicised sentences (_word word_) I
might try this:

irb> str
=> "asdf asdf _asdf asd_ asdf _ash \nash_ h"
irb> str.gsub(/_(.+?)_/m, "<i>\\1</i>")
=> "asdf asdf <i>asdf asd</i> asdf <i>ash \nash</i> h"

But then I would worry about performance because of the lazy operator
and would want to test it on some real data.

best,
Dan
C722d1f4590866f2344a683da3160b49?d=identicon&s=25 Wyatt Draggoo (Guest)
on 2007-06-24 18:36
(Received via mailing list)
On Sun, Jun 24, 2007 at 05:45:30PM +0900, Daniel Lucraft wrote:
> => "asdf asdf _asdf asd_ asdf _ash \nash_ h"
> irb> str.gsub(/_(.+?)_/m, "<i>\\1</i>")
> => "asdf asdf <i>asdf asd</i> asdf <i>ash \nash</i> h"

I like to be very strict with things like quotes (and underscores in
this case), so I would probably use:

irb> str
=> "asdf asdf _asdf asd_ asdf _ash \nash_ h"
irb> str.gsub(/_([^_]+)_/, "<i>\\1</i>")
=> "asdf asdf <i>asdf asd</i> asdf <i>ash \nash</i> h"

That seems to work like I would expect it to---I'm just coming over from
Perl...

Wyatt
Ae82cad40a0caca9c932d45c7a9eb3cd?d=identicon&s=25 Michael Glaesemann (Guest)
on 2007-06-24 20:21
(Received via mailing list)
On Jun 24, 2007, at 11:35 , Wyatt Draggoo wrote:

> On Sun, Jun 24, 2007 at 05:45:30PM +0900, Daniel Lucraft wrote:
>
>> irb> str.gsub(/_(.+?)_/m, "<i>\\1</i>")
>
> I like to be very strict with things like quotes (and underscores
> in this case), so I would probably use:
>
> irb> str.gsub(/_([^_]+)_/, "<i>\\1</i>")

 From a strictness point of view, what's the difference between /(.+?)
_/ and /([^_]+)_/ in the above? AIUI, they're equivalent. I
personally like the former because if you need to change the _ to
some other character, you only have to make a single character change.

Michael Glaesemann
grzm seespotcode net
Ce5990549a8185ad45643be2cfbff9b5?d=identicon&s=25 Mike Steiner (Guest)
on 2007-06-25 04:03
(Received via mailing list)
Thanks for the ideas about the _italics_!

I found out what my "weird" problem was - I wasn't double-escaping the
\1 in
the replacement string (I was using "<i>\1</i>" instead of
"<i>\\1</i>".)
It's funny how Python didn't require this. Hmmm.

Mike Steiner
This topic is locked and can not be replied to.