Ruby Forum Ruby > Regular expression: anything except

Posted by Gunther Gunt (gunbol)
on 13.08.2008 16:12
Hello,

I am trying to create a regular expression with the following rule:

my text is:
<TEXT id=''>blabla</TEXT>
I want to replace <TEXT*> with something

"<TEXT id=''>".gsub(/the magic regular expression/,"replacement")
==> replacementblabla</TEXT>

but i want also that
"<TEXT>blabla</TEXT>".gsub(/the magic regular expression/,"replacement")
returns the same result
==> replacementblabla</TEXT>

and
<TEXT *anything*>blabla</TEXT>".gsub(/the magic regular
expression/,"replacement")
returns the same result
==> replacementblabla</TEXT>

Can you help me ?

Thank you.

Gunther
Posted by Hassan Schroeder (Guest)
on 13.08.2008 16:26
(Received via mailing list)
On Wed, Aug 13, 2008 at 7:10 AM, Gunther Gunt
<gunther.thevenin@airbus.com> wrote:

> I am trying to create a regular expression with the following rule:
>
> my text is:
> <TEXT id=''>blabla</TEXT>
> I want to replace <TEXT*> with something
>
> "<TEXT id=''>".gsub(/the magic regular expression/,"replacement")
> ==> replacementblabla</TEXT>

This example isn't consistent with the following ones -- do you
want the ending </TEXT> inserted even if it's not in the original
string?

> but i want also that
> "<TEXT>blabla</TEXT>".gsub(/the magic regular expression/,"replacement")
> returns the same result
> ==> replacementblabla</TEXT>
>
> and
> <TEXT *anything*>blabla</TEXT>".gsub(/the magic regular
> expression/,"replacement")
> returns the same result
> ==> replacementblabla</TEXT>

Assuming the second two are actually what you want, try

"<TEXT id='foo'>blabla</TEXT>".gsub(/<TEXT[^>]*>/, 'replacement')

HTH,
Posted by Gunther Gunt (gunbol)
on 13.08.2008 16:37
Hassan Schroeder wrote:
> On Wed, Aug 13, 2008 at 7:10 AM, Gunther Gunt
> <gunther.thevenin@airbus.com> wrote:
> 
>> I am trying to create a regular expression with the following rule:
>>
>> my text is:
>> <TEXT id=''>blabla</TEXT>
>> I want to replace <TEXT*> with something
>>
>> "<TEXT id=''>".gsub(/the magic regular expression/,"replacement")
>> ==> replacementblabla</TEXT>
> 
> This example isn't consistent with the following ones -- do you
> want the ending </TEXT> inserted even if it's not in the original
> string?
> 
>> but i want also that
>> "<TEXT>blabla</TEXT>".gsub(/the magic regular expression/,"replacement")
>> returns the same result
>> ==> replacementblabla</TEXT>
>>
>> and
>> <TEXT *anything*>blabla</TEXT>".gsub(/the magic regular
>> expression/,"replacement")
>> returns the same result
>> ==> replacementblabla</TEXT>
> 
> Assuming the second two are actually what you want, try
> 
> "<TEXT id='foo'>blabla</TEXT>".gsub(/<TEXT[^>]*>/, 'replacement')
> 
> HTH,

I had just found the solution, and I was posting it.

Thank you anyway ;o)

Gunther
Posted by Mark Thomas (markthomas)
on 13.08.2008 16:41
(Received via mailing list)
On Aug 13, 10:10 am, Gunther Gunt <gunther.theve...@airbus.com> wrote:
>
>
> Can you help me ?

Sure. Since it looks like XML, I recommend using an XML parser. It not
a good idea to parse XML with regular expressions; doing it robustly
is difficult, as there are many details that can trip you up.

What you want is the text content of the TEXT element. Try this:

require 'hpricot'
xml = Hpricot::XML('<TEXT id="foo" anything="whatever">blabla</TEXT>')
puts xml.search("//TEXT").text

# => "blabla"

- Mark.
Posted by Lars Haugseth (Guest)
on 13.08.2008 16:50
(Received via mailing list)
* Gunther Gunt <gunther.thevenin@airbus.com> wrote:
> ==> replacementblabla</TEXT>
> ==> replacementblabla</TEXT>
/<TEXT[^>]*>/

--
Lars Haugseth

"If anyone disagrees with anything I say, I am quite prepared not only 
to
 retract it, but also to deny under oath that I ever said it." -Tom 
Lehrer
Posted by Mark Thomas (markthomas)
on 13.08.2008 19:35
(Received via mailing list)
I said:
> Since it looks like XML, I recommend using an XML parser. It not
> a good idea to parse XML with regular expressions; doing it robustly
> is difficult, as there are many details that can trip you up.

< TEXT id="foo">blabla</TEXT>

Oops! Now the regex doesn't work.