Confused by back refs in gsub

Can someone tell me why, in my code below, I’m getting part of the
original search in my substitution in my result, when, I’m not asking
for it, or at least, I don’t think I’m asking for it.

Thanks,
Peter

Original line:
Normandy Group LLC

My Code:
xmlfile.gsub!(/(.*)</registrantName>/,
‘<SUB.HEAD4>&</SUB.HEAD4>’)
I’ve tried “\1” instead of “&,” too. Same result. I’ve also tried
putting in “?” marks to make it non-greedy. Same result.

Yields:
<SUB.HEAD4>Normandy Group
LLC</SUB.HEAD4>

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>

On 8/13/07, Peter B. [email protected] wrote:

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>

This works for me (I’ve used \1):

require ‘test/unit’
class TestGsub < Test::Unit::TestCase
def test_replace
line = “Normandy Group
LLC”

line.gsub!(/(.*)</registrantName>/,‘<SUB.HEAD4>\1</SUB.HEAD4>’)
assert_equal(line, ‘<SUB.HEAD4>Normandy Group
LLC</SUB.HEAD4>’)
end
end

Note that you have (/SUB.HEAD4> instead of </SUB.HEAD4> (the
parenthesis)

I’ve tried “\1” instead of “&,” too. Same result. I’ve also

If you’re hardcoding replacements like that and are certain that your
source
is well formed xml, you could also just skip the back references:

irb(main):001:0> “Normandy Group
LLC”.gsub!(/registrantName>/, ‘SUB.HEAD4>’)
=> “<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>”
irb(main):002:0>

Jano S. wrote:

On 8/13/07, Peter B. [email protected] wrote:

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>

This works for me (I’ve used \1):

require ‘test/unit’
class TestGsub < Test::Unit::TestCase
def test_replace
line = “Normandy Group
LLC”

line.gsub!(/(.*)</registrantName>/,‘<SUB.HEAD4>\1</SUB.HEAD4>’)
assert_equal(line, ‘<SUB.HEAD4>Normandy Group
LLC</SUB.HEAD4>’)
end
end

Note that you have (/SUB.HEAD4> instead of </SUB.HEAD4> (the
parenthesis)

Thank you, Jano. Yes, this worked for me now.

Cheers.

Felix W. wrote:

I’ve tried “\1” instead of “&,” too. Same result. I’ve also

If you’re hardcoding replacements like that and are certain that your
source
is well formed xml, you could also just skip the back references:

irb(main):001:0> “Normandy Group
LLC”.gsub!(/registrantName>/, ‘SUB.HEAD4>’)
=> “<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>”
irb(main):002:0>

I don’t quite understand your suggestion, Felix. Yes, I believe my
source data is well-formed XML. Are you suggesting that, somehow,
because it is well-formed XML, I can ignore the element closings? I
tried what I thought you meant by:

xmlfile.gsub!(//, ‘<SUB.HEAD4>’)

and, I got the subhead callout at the beginning of the data, but, the
closing element still is there–/

-Peter

Alle lunedì 13 agosto 2007, Peter B. ha scritto:

irb(main):002:0>

-Peter

What Felix is suggesting is that, if the source is valid XML, then it
will
have the form

text

so, if you call gsub! passing a regexp matching elementName>, it should
replace both the opening and closing tags. When you tried, it didn’t
work
because you left the opening < in the regexp, which didn’t match the
closing
tag (it starts with </r, not <r). The correct call to gsub should be:

xmlfile.gsub!(/registrantName>/, ‘SUB.HEAD4>’)

(by the way, notice that the regexp doesn’t match the starting ‘<’, so
it gets
removed from the replacement string)

I hope this helps

Stefano

line.gsub!(/(.*)</registrantName>/,‘<SUB.HEAD4>\1</SUB.HEAD4>’)

Thank you, Jano. Yes, this worked for me now.

Please note that regular expressions aren’t a very good way to parse
XML. The above expression subgroup will match everything between the
first “” and the last “” which is
probably not what you want.

You can can use non-greedy *? as a workaround in this case.

mfg, simon … l

I don’t quite understand your suggestion, Felix. Yes, I

Posted via http://www.ruby-forum.com/.

I’m leaving off the opening bracket ‘<’:

line.gsub!(/registrantName>/, ‘SUB.HEAD4>’)

So that it will match the opening as well as closing statement.

I’m leaving off the opening bracket ‘<’:

line.gsub!(/registrantName>/, ‘SUB.HEAD4>’)

So that it will match the opening as well as closing statement.

As well as any substring “registrantName>”. And well-formed XML won’t
guarantee that only “” and “” will
contain that.

gsub!(/(</?)registrantName>/, ‘\1SUB.HEAD4>’) should do.

But again, CDATA-sections and comments may well contain these strings.
I’d use XSLT or some SAX-Library if it has to be ruby.

mfg, simon … l

Simon K. wrote:

I’m leaving off the opening bracket ‘<’:

line.gsub!(/registrantName>/, ‘SUB.HEAD4>’)

So that it will match the opening as well as closing statement.

As well as any substring “registrantName>”. And well-formed XML won’t
guarantee that only “” and “” will
contain that.

gsub!(/(</?)registrantName>/, ‘\1SUB.HEAD4>’) should do.

But again, CDATA-sections and comments may well contain these strings.
I’d use XSLT or some SAX-Library if it has to be ruby.

mfg, simon … l

Thank you, everyone. Yes, my XML is well-formed, but, it’s also pretty
simple, and, from what our vendor tells me, pretty consistent. I just
need to convert it to SGML for our company publishing system. XSLT is
probably better for this, I’m sure, but, it’s enough for me just to
learn Ruby. (-: Plus, I love Ruby.
Thanks again.