Parsing XML into a complete domain object

Recently at work we’ve decided to attempt to build a basic XML driven
automation framework to work with Watir (a web development testing
library for ruby).

I cant figure out how to loop through each level of the REXML document
to extract the data needed to build the complete object.

It seems that if i try to use any iterators on a root.elements[] object
it converts it to text so i can’t nest another iterator or loop to
access the innards.

my only recourse has been to resort to a ton of nested while loops which
is ugly when compared to most other ruby loops.

eg.
i = 1
while root.elements[‘cases’].elements[i] != nil

n = 1
while root.elements[‘cases’].elements[‘test-case’] != nil

#more loops here. continue down the chain until i can build the
#object from the inside out.

i = i+i
end

i = i+i
end

my xml looks like this

somehow i have to get THAT modeled into an object
script object contains test-cases, test-cases contain test-steps etc…

Any thoughts? (sorry for the long post… its kinda hard to explain
without showing EVERYTHING.

Brian C. wrote:

#more loops here. continue down the chain until i can build the
my xml looks like this
button

script object contains test-cases, test-cases contain test-steps etc…

Any thoughts? (sorry for the long post… its kinda hard to explain
without showing EVERYTHING.

Hi Brian,

Have you given any thought to using YAML instead of XML?

If you’re comfortable with a data format that’s a little less
self-descriptive than XML, you may find that YAML’s ease of use could
work for you. It’s pretty nice to load up your YAML and have all of
your Ruby objects pieced together for you. I can send you a small
example if you’d like.

Regards,
Matthew

Matthew D. wrote:

Brian C. wrote:

Recently at work we’ve decided to attempt to build a basic XML driven
automation framework to work with Watir (a web development testing
library for ruby).

I cant figure out how to loop through each level of the REXML document
to extract the data needed to build the complete object.

Have you looked at REXML’s pull parser?


Have you given any thought to using YAML instead of XML?

Why not not just use Ruby to describe the data?


James B.

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.30secondrule.com - Building Better Tools

Matthew D. wrote:

YAML buys you a small amount of language independence. I’ve chosen YAML
before because I like how well it plays with Ruby. I’ve been able to
choose YAML because of how well it plays with other languages.

Perhaps, though more and more I run into YAML files with custom
object-specific serializations (e.g. the YAML files used in Ruby’s ri
system); XML tends to do better on that count, with far less coupling of
data and types.

Still, if one is using WATIR, then I suspect that cross-language
configuration is not an concern. (And if becomes a requirement, then
the Ruby used to defined the tests can be exported as XML or YAML or
whatever works best.)


James B.

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.30secondrule.com - Building Better Tools

On Sat, 2006-05-20 at 12:14 +0900, James B. wrote:

Matthew D. wrote:

Have you given any thought to using YAML instead of XML?

Why not not just use Ruby to describe the data?

YAML buys you a small amount of language independence. I’ve chosen YAML
before because I like how well it plays with Ruby. I’ve been able to
choose YAML because of how well it plays with other languages.

If you just want to walk the tree from any entry point, through all
the sub-levels, you can use the standard each_recurse method.

	#Recurse end to end, printing the tags
	@doc.elements.each("definitions/src") do |element|
		print "<", element.name.to_s, ">"
		element.each_recursive do |childElement|
			print "<", childElement.name.to_s, ">"
		end
	end

If you just want the next level of children, but no deeper, I’m not
sure what you call. I did this when I played with REXML, and the
obvious each_child doesn’t give you an REXML::Element. It gives a
REXML::Text element at the first iteration, then the next
REXML::Element, then another REXML::Text object, etc. Not quite what
you want. But adding this to your code will work.

module REXML
# Visit all children of this node, but don’t recurse to their children
def each_child_element(&block)
self.elements.each {|node|
block.call(node)
}
end
end

It probably exists in some form in the REXML module, but I can’t
find it, so I recreated it (by a little hacking of the modules
each_recurse).

You can then
#printing the tags of the immediate children.
@doc = Document.new(File.new(format_file))
@doc.elements.each(“definitions”) do |element|
element.each_child_element do |childElement|
print “<”, childElement.name.to_s, “>”
end
end

or recursively walk the tree by calling each_child_element for each
returned childElement (as with the first example)

def recurse(the_element)
		the_element.each_child_element do |childElement|
			print "<", childElement.name.to_s, ">"
			recurse(childElement)
		end
end

@doc.elements.each("definitions/src") do |element|
	recurse(element)
end

Oops, small bugfix:

On Sat, 2006-05-20 at 20:35 +0900, I wrote:

d.add_call_method(’/script/test-case/test-step/test/interaction’, :interaction=)
d.add_call_param(’/script/test-case/test-step/test/interaction’)

d.add_object_create(’/script/test-case/test-step/check’, Check)

  • d.add_link(’/script/test-case/test-step/test’) { |ts, t| ts.checks <<
    t }
  • d.add_link(’/script/test-case/test-step/check’) { |ts, t| ts.checks <<
    t }

On May 20, 2006, at 12:19 AM, James B. wrote:

before because I like how well it plays with Ruby. I’ve been
able to
choose YAML because of how well it plays with other languages.

Perhaps, though more and more I run into YAML files with custom
object-specific serializations (e.g. the YAML files used in Ruby’s
ri system); XML tends to do better on that count, with far less
coupling of data and types.

Right, and if it’s something I need to hand edit, I find my brain can
remember XML syntax easier than YAML’s myriad of choices.

James Edward G. II

On Sat, 2006-05-20 at 06:06 +0900, Brian C. wrote:

 <name>button 1</type>

somehow i have to get THAT modeled into an object
script object contains test-cases, test-cases contain test-steps etc…

Looks like an ideal DigestR1 opportunity, if you’re able to get
Libxml-ruby installed too(*):

#!/usr/local/bin/ruby
require ‘xml/digestr’
require ‘pp’

class Script
attr_accessor :name, :starturl, :testcases
def initialize; @testcases = []; end
end

class TestCase
attr_accessor :id, :steps
def initialize; @steps = []; end
end

class TestStep
attr_accessor :id, :tests, :checks
def initialize; @tests, @checks = [], []; end
end

class Check
attr_accessor :elements
def initialize; @elements = []; end
end

class Test < Check
attr_accessor :interaction
end

class Element
attr_accessor :name, :type
end

d = XML::Digester.new(true)
d.add_object_create(‘/script’, Script)

d.add_call_method(‘/script/project-name’, :name=)
d.add_call_param(‘/script/project-name’)
d.add_call_method(‘/script/start-url’, :starturl=)
d.add_call_param(‘/script/start-url’)

d.add_object_create(‘/script/test-case’, TestCase)
d.add_set_properties(‘/script/test-case’)
d.add_link(‘/script/test-case’) { |sc,tc| sc.testcases << tc }

d.add_object_create(‘/script/test-case/test-step’, TestStep)
d.add_set_properties(‘/script/test-case/test-step’)
d.add_link(‘/script/test-case/test-step’) { |tc,ts| tc.steps << ts }

d.add_object_create(‘/script/test-case/test-step/test’, Test)
d.add_link(‘/script/test-case/test-step/test’) { |ts, t| ts.tests << t }
d.add_call_method(‘/script/test-case/test-step/test/interaction’,
:interaction=)
d.add_call_param(‘/script/test-case/test-step/test/interaction’)

d.add_object_create(‘/script/test-case/test-step/check’, Check)
d.add_link(‘/script/test-case/test-step/test’) { |ts, t| ts.checks << t
}

d.add_object_create(‘/element’, Element)
d.add_link('
/element’) { |p, ele| p.elements << ele }

d.add_call_method(‘/element/name’, :name=)
d.add_call_param('
/element/name’)
d.add_call_method(‘/element/type’, :type=)
d.add_call_param('
/element/type’)

script = d.parse_file(‘watir.xml’)

pp script
END

This outputs (with the data you posted, with some mismatched close tags
fixed up):

#<Script:0xb7e8d64c
@name=“My Project”,
@starturl=“http://localhost:3000/”,
@testcases=
[#<TestCase:0xb7edcaec
@id=“1”,
@steps=
[#<TestStep:0xb7edae90
@checks=
[#<Test:0xb7ed9978
@elements=[#<Element:0xb7ed7dbc @name=“button 1”,
@type=“button”>],
@interaction=“double click”>],
@id=“1”,
@tests=
[#<Test:0xb7ed9978
@elements=[#<Element:0xb7ed7dbc @name=“button 1”,
@type=“button”>],
@interaction=“double click”>]>,
#<TestStep:0xb7e63ff0 @checks=[], @id=“2”, @tests=[]>]>,
#<TestCase:0xb7e63a14 @id=“2”, @steps=[]>]>

Which I think is what you’re after?

(*): If you can’t/won’t install native extensions, DigestR’s API is
intended to be mostly compatible with an older, REXML-based (IIRC)
digester at http://rubyforge.org/projects/xmldigester

On Fri, 2006-05-19 at 23:06, Brian C. wrote:

Recently at work we’ve decided to attempt to build a basic XML driven
automation framework to work with Watir (a web development testing
library for ruby).

I cant figure out how to loop through each level of the REXML document
to extract the data needed to build the complete object.

You might find a treewalker useful:

module XmlUtil

class TreeWalker

def initialize(strategy)
  @strategy = strategy
end

def walk(node)
  @strategy.execute_before(node) if @strategy.respond_to?

:execute_before
if node.instance_of?(REXML::Document)
walk(node.root)
elsif node.instance_of?(REXML::Element) then
node.children.each { |child|
walk(child)
}
end
@strategy.execute_after(node) if @strategy.respond_to?
:execute_after
end
end
end

The treewalker will walk the XML document, calling the execute_before
and execute_after methods of a strategy object.

You also need a strategy object. The strategy object looks something
like this:

class MyStrategy

def execute_before(node)
# Process start tags
case node
when REXML::Document :
# Do nothing with Document nodes.
# Necessary because Document inherits Element
when REXML::Element :
# Do something with the element
end
end

def execute_after(node)
# Process end tags
end
end

If the treewalker does not suit your needs, a node iterator (Java Xerces
style) might do the trick. Let me know if you need one. I’ve got working
code, but the implementation could be more elegant. (One of my first
Ruby classes.)

/Henrik


http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://tocsim.rubyforge.com/ - Process simulation
http://testunitxml.rubyforge.org/ - XML test framework
http://declan.rubyforge.org/ - Declarative XML processing