Process xml in a page

Hi all,

I would like to process the xml in the url below, but I am unsure how.
I’ve always had APIs to help me out before, so this is new.

http://clinicaltrials.gov/show/NCT00001372?displayxml=true

I want to get this xml in an object that I can then parse and extract
the portions I need.

Thank you for your help, as always.

Hunter W. wrote:

Thank you for your help, as always.
Use open-uri to fetch the XML, then load it into a REXML::Document
object.

Then use REXML’s XPath to grab what you need.

http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/
http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/index.html


James B.

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools

Thank you, James. I got a step further, but still need help.

This code outputs exactly what I want to see to the debug window:

require ‘open-uri’
require “rexml/document”
include REXML

url = “http://clinicaltrials.gov/show/NCT00001372?displayxml=true

open(url) { |page| print page.read() }

However, I am having trouble when using the REXML tutorial. Should I
somehow save the output to an .xml file locally? Or should I add the
output to a string? I am not sure how to do either as I am having
trouble directing the “print” output anywhere.

Thank you and sorry for the noobness. The api I used in a ruby project
before made everything a bit too easy, so I am still learning.

-Hunter

James B. wrote:

Hunter W. wrote:

Thank you for your help, as always.
Use open-uri to fetch the XML, then load it into a REXML::Document
object.

Then use REXML’s XPath to grab what you need.

http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/
http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/index.html


James B.

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools

Hunter W. wrote:

open(url) { |page| print page.read() }

require ‘open-uri’
require “rexml/document”
include REXML

url = “http://clinicaltrials.gov/show/NCT00001372?displayxml=true

xml = open(url).read
p xml

doc = REXML::Document.new(xml)

p doc.root.name


James B.

“I never dispute another person’s delusions, just their facts.”

  • Len Bullard

A guy in the comments on the page below had the same problem. Someone
supplied a fix for windows (2nd link). It’as all good after that.

http://redhanded.hobix.com/inspect/noXpathOnMessyHtmlIsJustAsEasyInRuby.html

http://www.dave.burt.id.au/ruby/iconv.zip

Thanks again, James!

I get the error below after a successful XML output to the debug window.
Thanks!

No such file to load – rexml/encodings/ASCII.rb
No decoder found for encoding ASCII. Please install iconv.
c:/ruby/lib/ruby/1.8/rexml/encoding.rb:33:in encoding=' c:/ruby/lib/ruby/1.8/rexml/source.rb:40:inencoding=’
c:/ruby/lib/ruby/1.8/rexml/parsers/baseparser.rb:202:in pull' c:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:21:inparse’
c:/ruby/lib/ruby/1.8/rexml/document.rb:176:in build' c:/ruby/lib/ruby/1.8/rexml/document.rb:45:ininitialize’
C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13:in `new’
C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13

c:/ruby/lib/ruby/1.8/rexml/encoding.rb:33:in encoding=': No decoder found for encoding ASCII. Please install iconv. (Exception) from c:/ruby/lib/ruby/1.8/rexml/source.rb:40:inencoding=’
from c:/ruby/lib/ruby/1.8/rexml/parsers/baseparser.rb:202:in pull' from c:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:21:inparse’
from c:/ruby/lib/ruby/1.8/rexml/document.rb:176:in build' from c:/ruby/lib/ruby/1.8/rexml/document.rb:45:ininitialize’
from C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13:in `new’
from C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13