Hello and thank you to all the wonderful and helpful people at this
forum. I am trying to figure out how to search through an XML file and
grab information. I have been reading the REXML tutorials but could not
see an answer to my problem in them
(http://www.germane-software.com/software/rexml/docs/tutorial.html).
The problem is I need to search by an attribute (in this case the ref)
and cannot figure out how. Here is a snippet of the XML I am trying to
extract information from:
117.4
119.7
0.
So basically I have to start with IfcWallStandardCase and from there
work my way through the “ref”'s until I get to the 3 IfcLengthMeasures.
I know how to grab the first ref “i1671” using:
XPath.match(doc,"/IfcWallStandardCase/ObjectPlacement/IfcLocalPlacement")
and some additional code.
My problem is I cannot figure out how to use this “i1671” to search the
xml and grab the next ref. This ref is the only thing linking the items
together, so it is the only thing that I can use.
Is it possible to search a document by using an attribute, and if so
how? In this case to use the ref, “i1671” to search the document for
where it is used as id=“i1671” so I can grab the next ref from there and
so on. Any help would be greatly appreciated.
finds all tags with an id attribute whose value is ‘i1671’. You might
want to check out an XPath tutorial to get specifics on XPath–rather
than the REXML docs–e.g.:
XPath.match(doc,"/IfcWallStandardCase/ObjectPlacement/IfcLocalPlacement")
and some additional code.
My problem is I cannot figure out how to use this “i1671” to search the
xml and grab the next ref. This ref is the only thing linking the items
together, so it is the only thing that I can use.
Is it possible to search a document by using an attribute, and if so
how? In this case to use the ref, “i1671” to search the document for
where it is used as id=“i1671” so I can grab the next ref from there and
so on. Any help would be greatly appreciated.
It is not entirely clear what you want. Do you want to look for all
“ref” instances and find elements they are referring to? Or do you
want to do some kind of graph traversal where you start with a
particular element and follow every ref attribute?
If the latter you can for example do a BFS.
10:11:30 Temp$ ./rx.rb
— VISIT:
— VISIT:
117.4
119.7
0.
10:11:43 Temp$ cat -n rx.rb
1 #!/bin/env ruby19
2
3 require ‘rexml/document’
4
5 doc = REXML::Document.new(DATA.read)
6
7 # BFS
8 queue = %w{i1671}
9
10 until queue.empty?
11 id = queue.shift
12
13 REXML::XPath.each(doc, "//[@id=’#{id}’]") do |e|
14 puts “— VISIT:”, e
15
16 REXML::XPath.each(e, './/[@ref]’) do |child|
17 next_id = child.attribute(‘ref’) and queue.push(next_id)
18 end
19 end
20 end
21
22 END
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 117.4
39 119.7
40 0.
41
42
43
10:11:47 Temp$
It is not entirely clear what you want. Do you want to look for all
“ref” instances and find elements they are referring to? Or do you
want to do some kind of graph traversal where you start with a
particular element and follow every ref attribute?
Hi and thank you for your help. I am sorry if what I wrote was unclear.
What my goal is is to start at a given location (in this case-
) and eventually grab the three
IfcLengthMeasure text values, that are associated with this
, and put them into an array.
If the latter you can for example do a BFS.
10:11:30 Temp$ ./rx.rb
— VISIT:
— VISIT:
117.4
119.7
0.
10:11:43 Temp$ cat -n rx.rb
1 #!/bin/env ruby19
2
3 require ‘rexml/document’
4
5 doc = REXML::Document.new(DATA.read)
6
7 # BFS
8 queue = %w{i1671}
9
10 until queue.empty?
11 id = queue.shift
12
13 REXML::XPath.each(doc, "//[@id=’#{id}’]") do |e|
14 puts “— VISIT:”, e
15
16 REXML::XPath.each(e, './/[@ref]’) do |child|
17 next_id = child.attribute(‘ref’) and queue.push(next_id)
18 end
19 end
20 end
21
22 END
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 117.4
39 119.7
40 0.
41
42
43
10:11:47 Temp$
I will give this a try.
Kind regards
robert
Dear 7stud. Using the XPath command:
doc = Document.new xml
target = XPath.match(doc, “//*[@id = ‘i1671’]”)
p target
It produces the following output as you said it would.
–output:–
[ … </>]
But I cannot figure out how to do anything with this from here to get to
the next point, and eventually be able to grab the three values.
Thank you for the help. I had been using REXML and it was working find
with one exception, it can be very slow. So now I am trying to use
Nokogiri, and am running into a very simple error, I cannot load xml
files. From the Nokogiri website I have been trying what is in their
tutorial:
f = File.open(“blossom.xml”)
doc = Nokogiri::XML(f)
f.close
But regardles of what I put in the ("…") it returns -
Error: #<Errno::EINVAL: C:/Program Files (x86)/Google/Google SketchUp
8/Plugins/examples/auto.rb:11:in read': Invalid argument - c:ourwalls.xml> or Error: #<Errno::ENOENT: C:/Program Files (x86)/Google/Google SketchUp 8/Plugins/examples/auto.rb:11:inread’: No such file or directory -
fourwalls.xml>
If I am simply trying to open an XML file named file.xml located at C:,
what would I put to open it? I have tried many things such as f
=File.open(“C:\file.xml”) and what not with no luck. What do I need to
do to open this? Do you simply need to change the \ to /?
But if I leave a space before IfcCartesianPoint in the call to the css
method I get a parser error (`on_error’: unexpected ’ ’ after ‘’
(Nokogiri::CSS::SyntaxError)). This is my file one.xml:
Thank you for your reply. When I continue to try and read the file I
have it keeps returning nil values and thus doesn’t work. But when I
copy and paste the xml you have written over the file I am trying to
read then it does work. I understand that the path is slightly
different but using the xpath command-
It should skip ahead to the first appearance of IfcCartesianPoint, much
the same as it works for using REXML xpath, no? As this same sting of
IfcCartesianPoint/Coordinates/IfcLengthMeasure appears in this file.
Based on the documentation here -
I think it should be working but it always returns nil.
I have attached the xml file I am trying to read and was wondering if
you could see where my error is occurring. The first instance of
IfcCartesianPoint/Coordinates/IfcLengthMeasure appears on line 225.
Maybe it’s the way you are passing the file to Nokogiri::XML? By the
way, in your way you are not closing the file handler. If you want to
pass Nokogiri the file instead of reading it yourself you can do:
Is there an advantage to using .open vs .read? The program I am writing
has to grab lots of information from the xml, maybe 300 items, would it
make a difference in speed to use one vs the other? Also for me to read
a file at say c:\one.xml for it to read I have to write -
doc = File.open("/one.xml") {|f| Nokogiri::XML(f)}
Another form will not read including “\one.xml”
Thank you again for your time, you have been most helpful and it is
greatly appreciated.
Thank you for your reply. When I continue to try and read the file I
have it keeps returning nil values and thus doesn’t work. But when I
copy and paste the xml you have written over the file I am trying to
read then it does work.
The difference is that you have namespaces in your file. Check this URL:
(I added a line that shows all namespaces in the document). All nodes
under the uos node inherit the namespace referenced by the url you see
in the code, so in order to search for nodes within the uos node, you
need to specify the namespace.
Is there an advantage to using .open vs .read?
read reads the whole file in memory. Passing a file handler to
nokogiri will probably make no difference, because most likely it’s
reading the full file to memory too.
The program I am writing
has to grab lots of information from the xml, maybe 300 items, would it
make a difference in speed to use one vs the other?
The only answer to this question is to benchmark.
Also for me to read
a file at say c:\one.xml for it to read I have to write -
doc = File.open("/one.xml") {|f| Nokogiri::XML(f)}
Another form will not read including “\one.xml”
I have no experience in Windows, but I think forward slashes should
always work (no idea about the drive letter, though).
read reads the whole file in memory. Passing a file handler to
nokogiri will probably make no difference, because most likely it’s
reading the full file to memory too.
I will have to read the whole file but it may make a crucial
difference whether it does so in one go or in chunks. Large files
might not even be readable with the File.read approach. If you pass
the file as a single string there is no choice but if you pass the
File instance nokogiri can decide what to do. This is more efficient.
Note also that because of buffering small files will have just one
(or a few) IO operations anyway.
The program I am writing
has to grab lots of information from the xml, maybe 300 items, would it
make a difference in speed to use one vs the other?
The only answer to this question is to benchmark.
I don’t think the file loading influences access speed. Once the file
is loaded into a object structure IO is over and all operations are in
memory plus the model of the file will be the same regardless whether
you read in one big chunk or in smaller ones.
The two approaches to loading the file do most likely have different
performance characteristics though.
But if I leave a space before IfcCartesianPoint in the call to the css
method I get a parser error (`on_error’: unexpected ’ ’ after ‘’
(Nokogiri::CSS::SyntaxError)). This is my file one.xml:
Maybe it’s the way you are passing the file to Nokogiri::XML? By the
way, in your way you are not closing the file handler. If you want to
pass Nokogiri the file instead of reading it yourself you can do:
doc.css(“uosNS|IfcCartesianPoint uosNS|Coordinates
any xml file read using Nokogiri if it has a namespace you must include
that with each time you are trying to grab information from it correct
(the name space is the url in xmlns=”…" correct?)?
Yes.
automatically register those for you. You will still have to use the
I tried this using the .xml I posted and it does not work. Is this
because the xmlns is not in the first line immediately following <?xml
version=“1.0”?>? In turn making it necessary for every inquirary to
include {“uosNS” => “http://www.iai-tech.org/ifcXML/IFC2x3/FINAL”}?
Correct. The root node of your XML is the doc tag, which declares
namespaces, but the uos tag has its own namespace too. nokogiri will
register the ones present in the root node, as the article says. The
children nodes of uos inherit the namespace declared in the uos tag,
so this is what you have to use to search. I have not checked the
behaviour about the automatic registering of namespaces, but reading
the article this is how I understand it.
Hello, Nokogiri has been going well for me but recently I have been
having trouble trying to read some xml, and from my reading online I
cannot find the proper way to write it using Nokogiri. Here are the two
lines I am having trouble with:
I am trying to get the reference for exp:pos=“1”, and I had this working
with using REXML with the following -
With nokogiri I can get it to read both pos 0 and 1, using .css and
.xpath-
$doc_noko.css(“uosNS|IfcWallStandardCase uosNS|IfcShapeRepresentation”,
{“uosNS” => $http})
and
$doc_noko.xpath("//uosNS:IfcWallStandardCase//uosNS:IfcShapeRepresentation",
{“uosNS” => $http})
But cannot figure out how to get it to read only pos=1 using either
method and continuously get error or nil.
1.
0.
0.
The issue I am having here is that I am reading this with Nokogiri using
.xpath and the colon in exp:double is giving me trouble since the xpath
is written -
(I added a line that shows all namespaces in the document). All nodes
under the uos node inherit the namespace referenced by the url you see
in the code, so in order to search for nodes within the uos node, you
need to specify the namespace.
Thank you for the response. After reading the link you provided to make
any xml file read using Nokogiri if it has a namespace you must include
that with each time you are trying to grab information from it correct
(the name space is the url in xmlns="…" correct?)? Like you did here:
In the link it says that, “Even though using namespaces is essential
when searching an XML document, Nokogiri tries to help out. If there are
namespaces declared on the root node of a document, Nokogiri will
automatically register those for you. You will still have to use the
prefix when searching the document, but the URL registration is done for
you.”
I tried this using the .xml I posted and it does not work. Is this
because the xmlns is not in the first line immediately following <?xml
version=“1.0”?>? In turn making it necessary for every inquirary to
include {“uosNS” => “http://www.iai-tech.org/ifcXML/IFC2x3/FINAL”}?
I will have to read the whole file but it may make a crucial
difference whether it does so in one go or in chunks. Large files
might not even be readable with the File.read approach. If you pass
the file as a single string there is no choice but if you pass the
File instance nokogiri can decide what to do. This is more efficient.
Note also that because of buffering small files will have just one
(or a few) IO operations anyway.
I will try both and see how each performs against each other and if both
work properly.
Thank you both for your replies. Your help with this has been
invaluable.
This has sufficiently deviated from the title of the thread, so to make
the information more relevant for future searchers I am going to make a
new post and end this one. I hope this is acceptable.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.