Forum: Ruby regexp questions

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Mike S. (Guest)
on 2007-06-24 02:23
(Received via mailing list)
I'm converting a big Python program to Ruby which uses lots of regexps,
and
I'm getting some odd errors. One problem seems to be that \1 in the
replacement string doesn't always work. Are there any known "gotchas"
between Python's regexps and Ruby's?

And what's the most robust way to convert "underlined" words to HTML
italics
using a regexp? Something like: "This _word_ is in italics." -> "This
<i>word</i> is in italics."

Thanks!

Mike S.
Daniel L. (Guest)
on 2007-06-24 12:45
Mike S. wrote:
> I'm getting some odd errors. One problem seems to be that \1 in the
> replacement string doesn't always work. Are there any known "gotchas"
> between Python's regexps and Ruby's?
>
> And what's the most robust way to convert "underlined" words to HTML
> italics
> using a regexp? Something like: "This _word_ is in italics." -> "This
> <i>word</i> is in italics."

Could you give an example of where it isn't working in the first case?

As for the second, I don't know about 'most robust' but I might try
something like this (with the assumption that words aren't broken over
lines):

irb> str
=> "asdf asdf _asdfasd_ asdf _ash_ h"
irb> str.gsub(/_([^\s]+)_/, "<i>\\1</i>")
=> "asdf asdf <i>asdfasd</i> asdf <i>ash</i> h"

Or if I wanted to be able to have italicised sentences (_word word_) I
might try this:

irb> str
=> "asdf asdf _asdf asd_ asdf _ash \nash_ h"
irb> str.gsub(/_(.+?)_/m, "<i>\\1</i>")
=> "asdf asdf <i>asdf asd</i> asdf <i>ash \nash</i> h"

But then I would worry about performance because of the lazy operator
and would want to test it on some real data.

best,
Dan
Wyatt Draggoo (Guest)
on 2007-06-24 20:36
(Received via mailing list)
On Sun, Jun 24, 2007 at 05:45:30PM +0900, Daniel L. wrote:
> => "asdf asdf _asdf asd_ asdf _ash \nash_ h"
> irb> str.gsub(/_(.+?)_/m, "<i>\\1</i>")
> => "asdf asdf <i>asdf asd</i> asdf <i>ash \nash</i> h"

I like to be very strict with things like quotes (and underscores in
this case), so I would probably use:

irb> str
=> "asdf asdf _asdf asd_ asdf _ash \nash_ h"
irb> str.gsub(/_([^_]+)_/, "<i>\\1</i>")
=> "asdf asdf <i>asdf asd</i> asdf <i>ash \nash</i> h"

That seems to work like I would expect it to---I'm just coming over from
Perl...

Wyatt
Michael G. (Guest)
on 2007-06-24 22:21
(Received via mailing list)
On Jun 24, 2007, at 11:35 , Wyatt Draggoo wrote:

> On Sun, Jun 24, 2007 at 05:45:30PM +0900, Daniel L. wrote:
>
>> irb> str.gsub(/_(.+?)_/m, "<i>\\1</i>")
>
> I like to be very strict with things like quotes (and underscores
> in this case), so I would probably use:
>
> irb> str.gsub(/_([^_]+)_/, "<i>\\1</i>")

 From a strictness point of view, what's the difference between /(.+?)
_/ and /([^_]+)_/ in the above? AIUI, they're equivalent. I
personally like the former because if you need to change the _ to
some other character, you only have to make a single character change.

Michael G.
grzm seespotcode net
Mike S. (Guest)
on 2007-06-25 06:03
(Received via mailing list)
Thanks for the ideas about the _italics_!

I found out what my "weird" problem was - I wasn't double-escaping the
\1 in
the replacement string (I was using "<i>\1</i>" instead of
"<i>\\1</i>".)
It's funny how Python didn't require this. Hmmm.

Mike S.
This topic is locked and can not be replied to.