Can someone tell me why, in my code below, I’m getting part of the
original search in my substitution in my result, when, I’m not asking
for it, or at least, I don’t think I’m asking for it.
Thanks,
Peter
Original line:
Normandy Group LLC
My Code:
xmlfile.gsub!(/(.*)</registrantName>/,
‘<SUB.HEAD4>&</SUB.HEAD4>’)
I’ve tried “\1” instead of “&,” too. Same result. I’ve also tried
putting in “?” marks to make it non-greedy. Same result.
Yields:
<SUB.HEAD4>Normandy Group
LLC</SUB.HEAD4>
What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>
I’ve tried “\1” instead of “&,” too. Same result. I’ve also
If you’re hardcoding replacements like that and are certain that your
source
is well formed xml, you could also just skip the back references:
irb(main):001:0> “Normandy Group
LLC”.gsub!(/registrantName>/, ‘SUB.HEAD4>’)
=> “<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>”
irb(main):002:0>
I don’t quite understand your suggestion, Felix. Yes, I believe my
source data is well-formed XML. Are you suggesting that, somehow,
because it is well-formed XML, I can ignore the element closings? I
tried what I thought you meant by:
xmlfile.gsub!(//, ‘<SUB.HEAD4>’)
and, I got the subhead callout at the beginning of the data, but, the
closing element still is there–/
What Felix is suggesting is that, if the source is valid XML, then it
will
have the form
text
so, if you call gsub! passing a regexp matching elementName>, it should
replace both the opening and closing tags. When you tried, it didn’t
work
because you left the opening < in the regexp, which didn’t match the
closing
tag (it starts with </r, not <r). The correct call to gsub should be:
xmlfile.gsub!(/registrantName>/, ‘SUB.HEAD4>’)
(by the way, notice that the regexp doesn’t match the starting ‘<’, so
it gets
removed from the replacement string)
Please note that regular expressions aren’t a very good way to parse
XML. The above expression subgroup will match everything between the
first “” and the last “” which is
probably not what you want.
You can can use non-greedy *? as a workaround in this case.
So that it will match the opening as well as closing statement.
As well as any substring “registrantName>”. And well-formed XML won’t
guarantee that only “” and “” will
contain that.
gsub!(/(</?)registrantName>/, ‘\1SUB.HEAD4>’) should do.
But again, CDATA-sections and comments may well contain these strings.
I’d use XSLT or some SAX-Library if it has to be ruby.
mfg, simon … l
Thank you, everyone. Yes, my XML is well-formed, but, it’s also pretty
simple, and, from what our vendor tells me, pretty consistent. I just
need to convert it to SGML for our company publishing system. XSLT is
probably better for this, I’m sure, but, it’s enough for me just to
learn Ruby. (-: Plus, I love Ruby.
Thanks again.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.