REXML element reading <br /> error

johnnybutler7 · August 31, 2007, 9:57pm

When reading in the site element from my xml file using rexml it seems
to be chopping the rest of the text off after the first

The value in the XML file is below
123 street
amstown
amserland

element = REXML::XPath.first(doc, ‘//Site’)

puts element.text #shows 123 Street

How can i get the full data and once i have it i can remove the

I cant find any information on this???

JB

johnnybutler7 · August 31, 2007, 10:19pm

On 8/31/07, John B. [email protected] wrote:

When reading in the site element from my xml file using rexml it seems
to be chopping the rest of the text off after the first

The value in the XML file is below
123 street
amstown
amserland

element = REXML::XPath.first(doc, ‘//Site’)

I’d suggest using a bit more XPath, both text() and a each {} to
iterate through the text nodes (which are distinct):

$ irb -r rexml/document --prompt xmp
a = REXML::Document.new(“123
street
amstown
amserland”)

=> … </>

REXML::XPath.first(a, ‘//Site’).text

=> “123 street”

REXML::XPath.first(a, ‘//Site/text()’).to_s

=> “123 street”

REXML::XPath.each(a, ‘//Site/text()’) {|el| puts el}
123 street
amstown
amserland

=> [“123 street”, “amstown”, “amserland”]

HTH,
Keith

johnnybutler7 · September 1, 2007, 1:27am

Hi,

At Sat, 1 Sep 2007 05:18:48 +0900,
Keith F. wrote in [ruby-talk:266990]:

I’d suggest using a bit more XPath, both text() and a each {} to
iterate through the text nodes (which are distinct):

$ irb -r rexml/document --prompt xmp
a = REXML::Document.new(“123 street
amstown
amserland”)

=> … </>

REXML::XPath.first(a, ‘//Site’).text

=> “123 street”

Seems like that just REXML::XPath.first(a, ‘//Site’).to_s
returns the whole content.

johnnybutler7 · September 1, 2007, 1:50am

Keith F. wrote:

REXML::XPath.each(a, ‘//Site/text()’) {|el| puts el}

The assert_xpath plugin wraps that up in this convenient method:

class REXML::Element
  def inner_text
    return self.each_element( './/text()' ){}.join( '' )
  end
end

…
def test_absolve_breaks
a = REXML::Document.new(“123
street
amstown
amserland”)
assert_equal “123 streetamstownamserland”, a.inner_text
end

Come to think of it, that’s not terribly programmer-friendly! Let’s
upgrade
it a little…

  assert_equal "123 streetamstownamserland", a.inner_text
  assert_equal "123 street\namstown\namserland", a.inner_text("\n")

…
def inner_text(interstitial = ‘’)
return self.each_element( ‘.//text()’ ){}.join(interstitial)
end

johnnybutler7 · September 1, 2007, 7:21pm

On Sat, 01 Sep 2007 04:57:57 +0900, John B. wrote:

When reading in the site element from my xml file using rexml it seems
to be chopping the rest of the text off after the first

Not quite. It gives you the first text element.

The value in the XML file is below
123 street
amstown
amserland

element = REXML::XPath.first(doc, ‘//Site’)

puts element.text #shows 123 Street

How can i get the full data and once i have it i can remove the
I
cant find any information on this???

You can’t find any specific info because there isn’t anything specific.
You have an XML element that contains a text node, an empty element
named
br, another text node, another empty element named br and another text
node. In the XML world,
is a node like any other.

The REXML::Element.texts method is what you are looking for:

$ irb
irb(main):001:0> require “rexml/document”
=> true

irb(main):002:0> doc=REXML::Document.new("123
street
amstown<br/

amserland")
=> … </>

irb(main):003:0> doc.root.texts
=> [“123 street”, “amstown”, “amserland”]

irb(main):004:0> doc.root.texts.join " "
=> “123 street amstown amserland”

Enjoy!