Help with a regular expressions and gsub


#1

I need to use gsub in a large text. For example, I will like to search
for ‘car’ in this text.

“Lorem ipum car. Locar ipsum,

car lorem ipusum

.Lorem car ipsum”

I will like to put ‘bike’ instead of ‘car’. Exactly I need this words,
and not ’ bike ', for example. Also, I do not want to do ‘Lobike’ in
‘Locar’ (it should remain ‘Locar’).

I’ll appreciate any help.


#2

On Sat, Apr 18, 2009 at 9:35 AM, John S.
removed_email_address@domain.invalid wrote:

I’ll appreciate any help.

Use word break matches in your regular expression as follows:

.gsub(/\bcar\b/, ‘bike’)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain


#3

Andrew T. wrote:

On Sat, Apr 18, 2009 at 9:35 AM, John S.
removed_email_address@domain.invalid wrote:

I’ll appreciate any help.

Use word break matches in your regular expression as follows:

.gsub(/\bcar\b/, ‘bike’)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain

Thanks. What if I have…?

“Lorem ipum car. Locar ipsum,

car lorem ipusum

.Lorem car ipsum.
Lorem car ipsum.”

And now I want to obtain the same but I do not want to obtain bike,


#4

Thanks. When I use <>. what I need is to obtain

bike

and car.

#5

On Mon, Apr 20, 2009 at 9:15 AM, John S.
removed_email_address@domain.invalid wrote:

Thanks. When I use <>. what I need is to obtain

bike

and car.

I’m starting to think that regular expressions are not the best way to
solve this.
You should probably use an HTML parser and then do regular expression
substitutions based on where in the DOM you are

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain


#6

On Sun, Apr 19, 2009 at 10:56 PM, John S.
removed_email_address@domain.invalid wrote:

And now I want to obtain the same but I do not want to obtain bike,

This will start to get tricky as it becomes important to know what the
rules really are.
Are you wanting to avoid car surrounded by an anchor tag, or car
surrounded by any tag.

car lorem ipusum

will pose a problem as it's half surrounded by a tag.

so to satisfy your test case, the following works by making sure that
car is not followed by a ‘<’ which will work on both cases mentioned
above.

.gsub(/\bcar\b(?=[^<])/, ‘bike’)
This checks that car is surrounded by a word-break but is not followed
by a ‘<’

If, in Ruby 1.9, I do the following:
.gsub(/(?<!>)\bcar\b(?=[^<])/, ‘bike’)
I am now checking that car is surrounded by a word-break and is not
preceded by a ‘>’ and not followed by a ‘<’ however this pattern will
not replace “

car lorem” with “

bike lorem”

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain


#7

On Mon, Apr 20, 2009 at 10:38 AM, John S.
removed_email_address@domain.invalid wrote:

Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math ‘<’, but why should I use the ‘()’ and ‘?=’ ?

(?=) is a look-ahead match that doesn’t consume the characters
matched
In string “cartoon”
/car(?=[^<])/ will only match “car” and /car[^<]/ will match “cart”
/car(?=[^<])/ can actually be rewritten as /car(?!<)/ where ?! is a
negative look ahead

(?<) is a look-behind match that doesn’t consume the
characters matched (only Ruby 1.9)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain


#8

Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math ‘<’, but why should I use the ‘()’ and ‘?=’ ?