Help with a regular expressions and gsub

captaffy · April 18, 2009, 9:35am

I need to use gsub in a large text. For example, I will like to search
for ‘car’ in this text.

“Lorem ipum car. Locar ipsum,

car lorem ipusum

.Lorem car ipsum”

I will like to put ‘bike’ instead of ‘car’. Exactly I need this words,
and not ’ bike ', for example. Also, I do not want to do ‘Lobike’ in
‘Locar’ (it should remain ‘Locar’).

I’ll appreciate any help.

captaffy · April 18, 2009, 9:39am

On Sat, Apr 18, 2009 at 9:35 AM, John S.
[email protected] wrote:

I’ll appreciate any help.

Use word break matches in your regular expression as follows:

.gsub(/\bcar\b/, ‘bike’)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain

captaffy · April 19, 2009, 10:56pm

Andrew T. wrote:

On Sat, Apr 18, 2009 at 9:35 AM, John S.
[email protected] wrote:

I’ll appreciate any help.

Use word break matches in your regular expression as follows:

.gsub(/\bcar\b/, ‘bike’)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain

Thanks. What if I have…?

“Lorem ipum car. Locar ipsum,

car lorem ipusum

.Lorem car ipsum.
Lorem car ipsum.”

And now I want to obtain the same but I do not want to obtain bike,

captaffy · April 20, 2009, 9:15am

Thanks. When I use <>. what I need is to obtain

bike

and car.

captaffy · April 20, 2009, 9:36am

On Mon, Apr 20, 2009 at 9:15 AM, John S.
[email protected] wrote:

Thanks. When I use <>. what I need is to obtain

bike
and car.

I’m starting to think that regular expressions are not the best way to
solve this.
You should probably use an HTML parser and then do regular expression
substitutions based on where in the DOM you are

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain

captaffy · April 20, 2009, 8:09am

On Sun, Apr 19, 2009 at 10:56 PM, John S.
[email protected] wrote:

And now I want to obtain the same but I do not want to obtain bike,

This will start to get tricky as it becomes important to know what the
rules really are.
Are you wanting to avoid car surrounded by an anchor tag, or car
surrounded by any tag.

car lorem ipusum

will pose a problem as it's half surrounded by a tag.

so to satisfy your test case, the following works by making sure that
car is not followed by a ‘<’ which will work on both cases mentioned
above.

.gsub(/\bcar\b(?=[^<])/, ‘bike’)
This checks that car is surrounded by a word-break but is not followed
by a ‘<’

If, in Ruby 1.9, I do the following:
.gsub(/(?<!>)\bcar\b(?=[^<])/, ‘bike’)
I am now checking that car is surrounded by a word-break and is not
preceded by a ‘>’ and not followed by a ‘<’ however this pattern will
not replace “

car lorem” with “

bike lorem”

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain

captaffy · April 20, 2009, 10:59am

On Mon, Apr 20, 2009 at 10:38 AM, John S.
[email protected] wrote:

Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math ‘<’, but why should I use the ‘()’ and ‘?=’ ?

(?=) is a look-ahead match that doesn’t consume the characters
matched
In string “cartoon”
/car(?=[^<])/ will only match “car” and /car[^<]/ will match “cart”
/car(?=[^<])/ can actually be rewritten as /car(?!<)/ where ?! is a
negative look ahead

(?<) is a look-behind match that doesn’t consume the
characters matched (only Ruby 1.9)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

“I have never let my schooling interfere with my education” - Mark Twain

captaffy · April 20, 2009, 10:38am

Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math ‘<’, but why should I use the ‘()’ and ‘?=’ ?