Forum: Ruby on Rails Help with a regular expressions and gsub

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
71e53b54803415f87ef1b898baf8f3ca?d=identicon&s=25 John Smith (terry_wolf)
on 2009-04-18 09:35
I need to use gsub in a large text. For example, I will like to search
for 'car' in this text.

"Lorem ipum car. Locar ipsum,<p>car lorem ipusum</p>.Lorem car ipsum"


I will like to put 'bike' instead of 'car'. Exactly I need this words,
and not ' bike ', for example. Also, I do not want to do 'Lobike' in
'Locar' (it should remain 'Locar').

I'll appreciate any help.
5772c599ccab3081e0fffb1d54f3b6de?d=identicon&s=25 Andrew Timberlake (andrewtimberlake)
on 2009-04-18 09:39
(Received via mailing list)
On Sat, Apr 18, 2009 at 9:35 AM, John Smith
<rails-mailing-list@andreas-s.net> wrote:
>
> I'll appreciate any help.

Use word break matches in your regular expression as follows:

<string>.gsub(/\bcar\b/, 'bike')

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
71e53b54803415f87ef1b898baf8f3ca?d=identicon&s=25 John Smith (terry_wolf)
on 2009-04-19 22:56
Andrew Timberlake wrote:
> On Sat, Apr 18, 2009 at 9:35 AM, John Smith
> <rails-mailing-list@andreas-s.net> wrote:
>>
>> I'll appreciate any help.
>
> Use word break matches in your regular expression as follows:
>
> <string>.gsub(/\bcar\b/, 'bike')
>
> Andrew Timberlake
> http://ramblingsonrails.com
> http://www.linkedin.com/in/andrewtimberlake
>
> "I have never let my schooling interfere with my education" - Mark Twain

Thanks. What if I have...?

"Lorem ipum car. Locar ipsum,<p>car lorem ipusum</p>.Lorem car ipsum.
Lorem <a href='#'>car</a> ipsum."

And now I want to obtain the same but I do not want to obtain <a
href='#'>bike</a>,
5772c599ccab3081e0fffb1d54f3b6de?d=identicon&s=25 Andrew Timberlake (andrewtimberlake)
on 2009-04-20 08:09
(Received via mailing list)
On Sun, Apr 19, 2009 at 10:56 PM, John Smith
<rails-mailing-list@andreas-s.net> wrote:
>>
>
> And now I want to obtain the same but I do not want to obtain <a
> href='#'>bike</a>,
>

This will start to get tricky as it becomes important to know what the
rules really are.
Are you wanting to avoid car surrounded by an anchor tag, or car
surrounded by any tag.
<p>car lorem ipusum</p> will pose a problem as it's half surrounded by a
tag.

so to satisfy your test case, the following works by making sure that
car is not followed by a '<' which will work on both cases mentioned
above.

<string>.gsub(/\bcar\b(?=[^<])/, 'bike')
This checks that car is surrounded by a word-break but is not followed
by a '<'

If, in Ruby 1.9, I do the following:
<string>.gsub(/(?<!>)\bcar\b(?=[^<])/, 'bike')
I am now checking that car is surrounded by a word-break and is not
preceded by a '>' and not followed by a '<' however this pattern will
not replace "<p>car lorem" with "<p>bike lorem"

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
71e53b54803415f87ef1b898baf8f3ca?d=identicon&s=25 John Smith (terry_wolf)
on 2009-04-20 09:15
Thanks. When I use <>. what I need is to obtain <p>bike</p> and <a
href='#'>car</a>.
5772c599ccab3081e0fffb1d54f3b6de?d=identicon&s=25 Andrew Timberlake (andrewtimberlake)
on 2009-04-20 09:36
(Received via mailing list)
On Mon, Apr 20, 2009 at 9:15 AM, John Smith
<rails-mailing-list@andreas-s.net> wrote:
>
> Thanks. When I use <>. what I need is to obtain <p>bike</p> and <a
> href='#'>car</a>.
>

I'm starting to think that regular expressions are not the best way to
solve this.
You should probably use an HTML parser and then do regular expression
substitutions based on where in the DOM you are

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
71e53b54803415f87ef1b898baf8f3ca?d=identicon&s=25 John Smith (terry_wolf)
on 2009-04-20 10:38
Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math '<', but why should I use the '()' and '?='  ?
5772c599ccab3081e0fffb1d54f3b6de?d=identicon&s=25 Andrew Timberlake (andrewtimberlake)
on 2009-04-20 10:59
(Received via mailing list)
On Mon, Apr 20, 2009 at 10:38 AM, John Smith
<rails-mailing-list@andreas-s.net> wrote:
>
> Thanks again for yyour help.
> Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
> no math '<', but why should I use the '()' and '?='  ?

(?=<pattern>) is a look-ahead match that doesn't consume the characters
matched
In string "cartoon"
/car(?=[^<])/ will only match "car" and /car[^<]/ will match "cart"
/car(?=[^<])/ can actually be rewritten as /car(?!<)/ where ?! is a
negative look ahead

(?<<pattern>) is a look-behind match that doesn't consume the
characters matched (only Ruby 1.9)

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
This topic is locked and can not be replied to.