Ruby and XML

Hi Everyone,

I am new to Ruby and trying to use it to parse XML files so that I can
verify that name/value pairs in two (or more) XML files are defined
consistently

I have been trying to use the rexml API
http://www.germane-software.com/software/rexml/

The website has been down a lot lately and I haven’t found too many
other sources for information

Here is a snippet of my code

require “rexml/document”
include REXML

file1 = File.new( “test.xml” )
doc1 = REXML::Document.new file1
names1 = XPath.each(doc1, “//name”) { |e|}
values1 = XPath.each(doc1, “//value”) { |e|}
now I can parse names1 and values1 and then do the same for the second
XML file.
This approach is not great because it does ensure that the name/value
pairs are siblings.

I am wondering if there is a better way to do this.

Any help would be great appreciated. BTW - I would welcome articles on
using Ruby for configuration, release and deployment management on CM
Crossroads (www.cmcrossroads.com)

Bob Aiello
http://www.linkedin.com/in/BobAiello
[email protected]

Rexml is in the standard library, but is slow and awkward. Try
nokogiri instead: http://nokogiri.org/

First of all, I’d recommend you a different library. Personalyl I
found REXML awkward to use, Nokogiri (gem install nokogiri) is much
better. (It also parses HTML.)

We’ll probably need the XML file to be able to help you.

– Matma R.

Welcome to Ruby. I concur with the other post that Nokogiri is even
better than REXML.

I don’t fully understand your goal or the problem you mentioned about
siblings. Can you post a small sample XML and the output you’d like to
get from it?


(-, /\ / / //

XML file.
This approach is not great because it does ensure that the name/value
pairs are siblings.

Can you provide this test.xml file or a sample file like it, and what
the expected output is?
PS: Some docs for ReXML, which is now part of the Ruby standard
library:http://furious-waterfall-55.heroku.com/yard_stdlib/REXML.html
Regards,Chris W.Twitter: http://www.twitter.com/cwgem

On Sep 5, 2011, at 1:28 AM, 7stud – wrote:

puts “No tag that is a sibling”
end

puts “-” * 20
end

For comparison, here’s one way to write code with the similar
functionality using Nokogiri, but ensuring that the element is a
sibling of the element. Unlike the above code, the following
allows any number of elements between the and the :
require ‘nokogiri’

doc = Nokogiri::XML(IO.read("my.xml"))
doc.search('name').each do |name|
  puts "name: #{name.text}"
  if value=name.at_xpath('following-sibling::value')
    puts value.text
  else
     puts "No <value> tag that is a sibling"
   end
   puts "-" * 20
end

Output:
name: Tove
Tove’s value is: 10
--------------------
name: Jani
Jani’s value is: 20
--------------------
name: No sibling
No tag that is a sibling
--------------------

Again using this XML:
<?xml version="1.0"?>

Tove
Tove’s value is: 10
xxxx

  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>

  <parent>
    <name>Jani</name>
    <value>Jani's value is: 20</value>
    <garbage>xxxx</garbage>
  </parent>

  <parent>
     <name>No sibling</name>
  </parent>
  <value>No sibling's value is: 30</value>
</note>
<?xml version="1.0"?>

Tove
Tove’s value is: 10
xxxx

Tove
Jani
Reminder

Don't forget me this weekend! Jani Jani's value is: 20 xxxx Diane Diane's value is: 30

require ‘rexml/document’

f = File.new(“xml.xml”)
doc = REXML::Document.new(f)

REXML::XPath.each(doc, “//name”) do |element|
puts “name: #{element.text}”

if sibling = element.next_element
puts sibling.text
else
puts “Can’t find a tag that is a sibling”
end

puts “-” * 20
end

–output:–
name: Tove
Tove’s value is: 10

name: Jani
Jani’s value is: 20

name: Diane
Can’t find a tag that is a sibling

Despite its name, next_element() only returns the next element if it is
a sibling.

Gavin K. wrote in post #1020210:

On Sep 5, 2011, at 1:28 AM, 7stud – wrote:

puts “No tag that is a sibling”
end

puts “-” * 20
end

For comparison, here’s one way to write code with the similar
functionality using Nokogiri, but ensuring that the element is a
sibling of the element. Unlike the above code, the following
allows any number of elements between the and the :
require ‘nokogiri’

doc = Nokogiri::XML(IO.read("my.xml"))
doc.search('name').each do |name|
  puts "name: #{name.text}"
  if value=name.at_xpath('following-sibling::value')
    puts value.text
  else
     puts "No <value> tag that is a sibling"
   end
   puts "-" * 20
end

You can even use an XPath expression to find names which do not have
proper values:

//name[not(following-sibling::value)]
//name[following-sibling::[1][name()!=“value”]]
//name[count(following-sibling::value)!=1]
//name[following-sibling::
[1][name()!=“value”]]|//name[count(following-sibling::value)!=1]

attached and also here: Solutions for ruby-talk 387408 · GitHub

Kind regards

robert

Gavin K. wrote in post #1020210:

For comparison, here’s one way to write code with the similar
functionality using Nokogiri, but ensuring that the element is a
sibling of the element. Unlike the above code, the following
allows any number of elements between the and the :
require ‘nokogiri’

doc = Nokogiri::XML(IO.read("my.xml"))
doc.search('name').each do |name|
  puts "name: #{name.text}"
  if value=name.at_xpath('following-sibling::value')
    puts value.text
  else
     puts "No <value> tag that is a sibling"
   end
   puts "-" * 20
end

Nice! I couldn’t figure out the syntax for xpath’s following-sibling,
but now that I see it in your code, here it is in REXML:

require ‘rexml/document’

f = File.new(“xml.xml”)
doc = REXML::Document.new(f)

REXML::XPath.each(doc, “//name”) do |element|
puts “name: #{element.text}”

if sibling = REXML::XPath.match(element,
“following-sibling::value”).first
puts sibling.text
else
puts “Can’t find a tag that is a sibling”
end

puts “-” * 20
end

And here is some trickier xml that has siblings between the and
tags:

<?xml version="1.0"?>

Tove
xxxx
Tove’s value is: 10

Tove
Jani
Reminder

Don't forget me this weekend! Jani xxxx xxxx Jani's value is: 20 1200 Diane Diane's value is: 30

–output:–
name: Tove
Tove’s value is: 10

name: Jani
Jani’s value is: 20

name: Diane
Can’t find a tag that is a sibling

thanks for all of the excellent responses. I am going through them all.

I have attached a copy of an XML (with some of the name/value pairs
deleted for brevity.

<?xml version="1.0" encoding="UTF-8"?>


core.was.home /usr/IBM/WebSphere/AppServer1/profiles/AppSrvQA core.was.username admin core.was.password password core.application.name myapp core.application.context.root myroot

Essentially, I have hundreds of XML files like this one that contain
many many name value pairs. I am concerned that the value is defined
differently (actually I have seen this) in one or more of the XML. I
want to take an xml file and then parse the name/value pairs into a
list. Then I want to check that list against all of the other XML in the
system that have the same name/value pairs.

So I might find that one XML defines

core.application.context.root
myroot

and another XML defines this name/value pair as

core.application.context.root
bigroot

Which would indicate a configuration management error that needs to be
corrected.
(Half the application is looking in the wrong place for the
core.application.context.root)

Ultimately, I want to implement this as part of an application
deployment framework possibly using Puppet or Chef.

Bob Aiello
http://www.linkedin.com/in/BobAiello


On Sep 5, 2011, at 5:42 PM, Bob Aiello wrote:

Essentially, I have hundreds of XML files like this one that contain
many many name value pairs. I am concerned that the value is defined
differently (actually I have seen this) in one or more of the XML. I
want to take an xml file and then parse the name/value pairs into a
list. Then I want to check that list against all of the other XML in the
system that have the same name/value pairs.

For your consideration, below is how I would write a script to handle
this. It creates a Hash storing names and key/value pairs; when the same
key is seen again with a new value, it keeps track of all values seen as
an array. The “SourceFile” module associates with each value string the
file that it was defined in, so that you can later see where the values
were defined. Use a module for this is both tricky

collider.rb

require 'nokogiri'

# Perhaps use Marshal to load this from a file if it exists,
# and save out the values seen so far at the end of the run.
$all_values = {}

# Find your file(s) to analyze however you want here
files = %w[ my.xml ]

module SourceFile
  attr_accessor :source_file
end

files.each do |file|
  doc = Nokogiri::XML(IO.read(file))
  doc.remove_namespaces!

  # Find every <name> that has a <value> sibling
  doc.xpath('//property/name[following-sibling::value]').each do 

|name|
value = name.at_xpath(‘following-sibling::value’).text
# Record where this value came from
value.extend(SourceFile); value.source_file = file

    name  = name.text
    if $all_values.key?(name)
      old = $all_values[name]
      unless old==value
        warn "#{name} is #{old.inspect} and #{value.inspect}"
        $all_values[name] = [*old,value]
      end
    else
      $all_values[name] = value
    end
  end
end
#=> core.app.root is "myroot" and "bigroot"
#=> core.app.root is ["myroot", "bigroot"] and "sarsaparilla root"

# Print any keys that point to an array of values...
$all_values.select{ |key,val| val.is_a?(Array) }.each do 

|key,values|
puts “#{key}:”
puts values.map{ |v| “%20s: ‘%s’” % [v.source_file,v] }
end
#=> core.app.root:
#=> my.xml: ‘myroot’
#=> my.xml: ‘bigroot’
#=> my.xml: ‘sarsaparilla root’

my.xml

<?xml version="1.0" encoding="UTF-8"?>
<product-state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn://mycompany.com/ia/product-state" 

xsi:type=“product-state”>



core.was.home
/usr/IBM/WebSphere/AppServer1/profiles/AppSrvQA


core.was.username
admin


core.was.password
password


core.application.name
myapp


core.app.root
myroot


core.app.root
bigroot


core.was.password
password


core.app.root
sarsaparilla root


On Sep 5, 2011, at 10:21 PM, Gavin K. wrote:

   value = name.at_xpath('following-sibling::value').text

Upon further reflection, I’d make one minor change to my code: instead
of requiring that the name key always come first in the XML, the
following tweak allows for the possibility of
:

Find all name elements that are in properties that also have a

element
doc.xpath(’//property[value]/name’).each do |name|

Find the first ‘value’ child in the property (assumes only one)

value = name.parent.at(‘value’).text