Regexp questions

mschwab · June 24, 2007, 12:23am

I’m converting a big Python program to Ruby which uses lots of regexps,
and
I’m getting some odd errors. One problem seems to be that \1 in the
replacement string doesn’t always work. Are there any known “gotchas”
between Python’s regexps and Ruby’s?

And what’s the most robust way to convert “underlined” words to HTML
italics
using a regexp? Something like: “This word is in italics.” -> “This
word is in italics.”

Thanks!

Mike S.

mschwab · June 24, 2007, 10:45am

Mike S. wrote:

I’m getting some odd errors. One problem seems to be that \1 in the
replacement string doesn’t always work. Are there any known “gotchas”
between Python’s regexps and Ruby’s?

And what’s the most robust way to convert “underlined” words to HTML
italics
using a regexp? Something like: “This word is in italics.” -> “This
word is in italics.”

Could you give an example of where it isn’t working in the first case?

As for the second, I don’t know about ‘most robust’ but I might try
something like this (with the assumption that words aren’t broken over
lines):

irb> str
=> “asdf asdf asdfasd asdf ash h”
irb> str.gsub(/([^\s]+)/, “\1”)
=> “asdf asdf asdfasd asdf ash h”

Or if I wanted to be able to have italicised sentences (word word) I
might try this:

irb> str
=> “asdf asdf asdf asd asdf ash \nash h”
irb> str.gsub(/(.+?)/m, “\1”)
=> “asdf asdf asdf asd asdf ash \nash h”

But then I would worry about performance because of the lazy operator
and would want to test it on some real data.

best,
Dan

mschwab · June 24, 2007, 6:36pm

On Sun, Jun 24, 2007 at 05:45:30PM +0900, Daniel L. wrote:

=> “asdf asdf asdf asd asdf ash \nash h”
irb> str.gsub(/(.+?)/m, “\1”)
=> “asdf asdf asdf asd asdf ash \nash h”

I like to be very strict with things like quotes (and underscores in
this case), so I would probably use:

irb> str
=> “asdf asdf asdf asd asdf ash \nash h”
irb> str.gsub(/([^]+)_/, “\1”)
=> “asdf asdf asdf asd asdf ash \nash h”

That seems to work like I would expect it to—I’m just coming over from
Perl…

Wyatt

mschwab · June 25, 2007, 4:03am

Thanks for the ideas about the italics!

I found out what my “weird” problem was - I wasn’t double-escaping the
\1 in
the replacement string (I was using “\1” instead of
“\1”.)
It’s funny how Python didn’t require this. Hmmm.

Mike S.

mschwab · June 24, 2007, 8:21pm

On Jun 24, 2007, at 11:35 , Wyatt Draggoo wrote:

On Sun, Jun 24, 2007 at 05:45:30PM +0900, Daniel L. wrote:

irb> str.gsub(/(.+?)/m, “\1”)

I like to be very strict with things like quotes (and underscores
in this case), so I would probably use:

irb> str.gsub(/([^]+)_/, “\1”)

From a strictness point of view, what’s the difference between /(.+?)
/ and /([^]+)_/ in the above? AIUI, they’re equivalent. I
personally like the former because if you need to change the _ to
some other character, you only have to make a single character change.

Michael G.
grzm seespotcode net