Forum: Ruby on Rails Help with a regular expressions and gsub

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
John S. (Guest)
on 2009-04-18 11:35
I need to use gsub in a large text. For example, I will like to search
for 'car' in this text.

"Lorem ipum car. Locar ipsum,<p>car lorem ipusum</p>.Lorem car ipsum"


I will like to put 'bike' instead of 'car'. Exactly I need this words,
and not ' bike ', for example. Also, I do not want to do 'Lobike' in
'Locar' (it should remain 'Locar').

I'll appreciate any help.
Andrew T. (Guest)
on 2009-04-18 11:39
(Received via mailing list)
On Sat, Apr 18, 2009 at 9:35 AM, John S.
<removed_email_address@domain.invalid> wrote:
>
> I'll appreciate any help.

Use word break matches in your regular expression as follows:

<string>.gsub(/\bcar\b/, 'bike')

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
John S. (Guest)
on 2009-04-20 00:56
Andrew T. wrote:
> On Sat, Apr 18, 2009 at 9:35 AM, John S.
> <removed_email_address@domain.invalid> wrote:
>>
>> I'll appreciate any help.
>
> Use word break matches in your regular expression as follows:
>
> <string>.gsub(/\bcar\b/, 'bike')
>
> Andrew T.
> http://ramblingsonrails.com
> http://www.linkedin.com/in/andrewtimberlake
>
> "I have never let my schooling interfere with my education" - Mark Twain

Thanks. What if I have...?

"Lorem ipum car. Locar ipsum,<p>car lorem ipusum</p>.Lorem car ipsum.
Lorem <a href='#'>car</a> ipsum."

And now I want to obtain the same but I do not want to obtain <a
href='#'>bike</a>,
Andrew T. (Guest)
on 2009-04-20 10:09
(Received via mailing list)
On Sun, Apr 19, 2009 at 10:56 PM, John S.
<removed_email_address@domain.invalid> wrote:
>>
>
> And now I want to obtain the same but I do not want to obtain <a
> href='#'>bike</a>,
>

This will start to get tricky as it becomes important to know what the
rules really are.
Are you wanting to avoid car surrounded by an anchor tag, or car
surrounded by any tag.
<p>car lorem ipusum</p> will pose a problem as it's half surrounded by a
tag.

so to satisfy your test case, the following works by making sure that
car is not followed by a '<' which will work on both cases mentioned
above.

<string>.gsub(/\bcar\b(?=[^<])/, 'bike')
This checks that car is surrounded by a word-break but is not followed
by a '<'

If, in Ruby 1.9, I do the following:
<string>.gsub(/(?<!>)\bcar\b(?=[^<])/, 'bike')
I am now checking that car is surrounded by a word-break and is not
preceded by a '>' and not followed by a '<' however this pattern will
not replace "<p>car lorem" with "<p>bike lorem"

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
John S. (Guest)
on 2009-04-20 11:15
Thanks. When I use <>. what I need is to obtain <p>bike</p> and <a
href='#'>car</a>.
Andrew T. (Guest)
on 2009-04-20 11:36
(Received via mailing list)
On Mon, Apr 20, 2009 at 9:15 AM, John S.
<removed_email_address@domain.invalid> wrote:
>
> Thanks. When I use <>. what I need is to obtain <p>bike</p> and <a
> href='#'>car</a>.
>

I'm starting to think that regular expressions are not the best way to
solve this.
You should probably use an HTML parser and then do regular expression
substitutions based on where in the DOM you are

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
John S. (Guest)
on 2009-04-20 12:38
Thanks again for yyour help.
Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
no math '<', but why should I use the '()' and '?='  ?
Andrew T. (Guest)
on 2009-04-20 12:59
(Received via mailing list)
On Mon, Apr 20, 2009 at 10:38 AM, John S.
<removed_email_address@domain.invalid> wrote:
>
> Thanks again for yyour help.
> Anyway, can you explain the meaning of (?=[^<]). I know that [^<] means
> no math '<', but why should I use the '()' and '?='  ?

(?=<pattern>) is a look-ahead match that doesn't consume the characters
matched
In string "cartoon"
/car(?=[^<])/ will only match "car" and /car[^<]/ will match "cart"
/car(?=[^<])/ can actually be rewritten as /car(?!<)/ where ?! is a
negative look ahead

(?<<pattern>) is a look-behind match that doesn't consume the
characters matched (only Ruby 1.9)

Andrew T.
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
This topic is locked and can not be replied to.