Regular expression: anything except

gunbol · August 13, 2008, 4:12pm

Hello,

I am trying to create a regular expression with the following rule:

my text is:
blabla
I want to replace <TEXT*> with something

“”.gsub(/the magic regular expression/,“replacement”)
==> replacementblabla

but i want also that
“blabla”.gsub(/the magic regular expression/,“replacement”)
returns the same result
==> replacementblabla

and
<TEXT anything>blabla".gsub(/the magic regular
expression/,“replacement”)
returns the same result
==> replacementblabla

Can you help me ?

Thank you.

Gunther

gunbol · August 13, 2008, 4:26pm

On Wed, Aug 13, 2008 at 7:10 AM, Gunther G.
[email protected] wrote:

I am trying to create a regular expression with the following rule:

my text is:
blabla
I want to replace <TEXT*> with something

“”.gsub(/the magic regular expression/,“replacement”)
==> replacementblabla

This example isn’t consistent with the following ones – do you
want the ending inserted even if it’s not in the original
string?

but i want also that
“blabla”.gsub(/the magic regular expression/,“replacement”)
returns the same result
==> replacementblabla

and
<TEXT anything>blabla".gsub(/the magic regular
expression/,“replacement”)
returns the same result
==> replacementblabla

Assuming the second two are actually what you want, try

“blabla”.gsub(/<TEXT[^>]*>/, ‘replacement’)

HTH,

gunbol · August 13, 2008, 4:37pm

Hassan S. wrote:

On Wed, Aug 13, 2008 at 7:10 AM, Gunther G.
[email protected] wrote:

I am trying to create a regular expression with the following rule:

my text is:
blabla
I want to replace <TEXT*> with something

“”.gsub(/the magic regular expression/,“replacement”)
==> replacementblabla

This example isn’t consistent with the following ones – do you
want the ending inserted even if it’s not in the original
string?

but i want also that
“blabla”.gsub(/the magic regular expression/,“replacement”)
returns the same result
==> replacementblabla

and
<TEXT anything>blabla".gsub(/the magic regular
expression/,“replacement”)
returns the same result
==> replacementblabla

Assuming the second two are actually what you want, try

“blabla”.gsub(/<TEXT[^>]*>/, ‘replacement’)

HTH,

I had just found the solution, and I was posting it.

Thank you anyway ;o)

Gunther

gunbol · August 13, 2008, 4:50pm

Gunther G. [email protected] wrote:

==> replacementblabla
==> replacementblabla
/<TEXT[^>]*>/

–
Lars H.

“If anyone disagrees with anything I say, I am quite prepared not only
to
retract it, but also to deny under oath that I ever said it.” -Tom
Lehrer

gunbol · August 13, 2008, 7:35pm

I said:

Since it looks like XML, I recommend using an XML parser. It not
a good idea to parse XML with regular expressions; doing it robustly
is difficult, as there are many details that can trip you up.

< TEXT id=“foo”>blabla

Oops! Now the regex doesn’t work.

gunbol · August 13, 2008, 4:41pm

On Aug 13, 10:10 am, Gunther G. [email protected] wrote:

Can you help me ?

Sure. Since it looks like XML, I recommend using an XML parser. It not
a good idea to parse XML with regular expressions; doing it robustly
is difficult, as there are many details that can trip you up.

What you want is the text content of the TEXT element. Try this:

require ‘hpricot’
xml = Hpricot::XML(‘blabla’)
puts xml.search(“//TEXT”).text

=> “blabla”

Mark.