REXML and Excel XML :)

I set myself the task of being able to parse a random xml file generated
by Excel 2003 and I’m having odd problems which I can’t really
understand.

---- Question Setup ----
Excel defines two element each with their own namespace. I’ll paste the
XML file at the bottom of this message.

** notice the top two elements: and

When those two elements with their namespaces atrributes are present, I
can’t access any of the lower elements like or ,
When I manually delete the namespace attributes, everything works fine.
In other words:

@doc.root.elements.each(‘Worksheet’) {} retruns three elements when I
remove the namespace attribute, but nothing when I put it back in.

---- Actual Question —
Is this just a weird instance of XML which REXML doesn’t support or do I
need to be doing some fancy XPath footwork to make it work?

cheers and thanks! :slight_smile:

Andrew

----- Sample Excel file -----

<?xml version="1.0"?> <?mso-application progid="Excel.Sheet"?>

<Workbook xmlns=“urn:schemas-microsoft-com:office:spreadsheet”

xmlns:o=“urn:schemas-microsoft-com:office:office”

xmlns:x=“urn:schemas-microsoft-com:office:excel”

xmlns:ss=“urn:schemas-microsoft-com:office:spreadsheet”

xmlns:html=“http://www.w3.org/TR/REC-html40”>

Gibson, Andrew

agibson

2006-06-26T12:48:00Z

2006-06-26T12:50:55Z

11.6360

8835

15180

120

105

False

False

<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="17" x:FullColumns="1"

x:FullRows=“1”>

<Cell><Data ss:Type="String">Column1</Data></Cell>

<Cell><Data ss:Type="String">Column2</Data></Cell>

<Cell><Data ss:Type="String">Column3</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>

<Cell><Data ss:Type="Number">2</Data></Cell>

<Cell><Data ss:Type="Number">3</Data></Cell>
<Cell><Data ss:Type="Number">4</Data></Cell>

<Cell><Data ss:Type="Number">5</Data></Cell>

<Cell><Data ss:Type="Number">6</Data></Cell>
<Cell><Data ss:Type="Number">7</Data></Cell>

<Cell><Data ss:Type="Number">8</Data></Cell>

<Cell><Data ss:Type="Number">9</Data></Cell>
<Cell><Data ss:Type="Number">10</Data></Cell>

<Cell><Data ss:Type="Number">11</Data></Cell>

<Cell><Data ss:Type="Number">12</Data></Cell>
<Cell><Data ss:Type="Number">13</Data></Cell>

<Cell><Data ss:Type="Number">14</Data></Cell>

<Cell><Data ss:Type="Number">15</Data></Cell>
<Cell><Data ss:Type="Number">16</Data></Cell>

<Cell><Data ss:Type="Number">17</Data></Cell>

<Cell><Data ss:Type="Number">18</Data></Cell>
<Cell><Data ss:Type="Number">19</Data></Cell>

<Cell><Data ss:Type="Number">20</Data></Cell>

<Cell><Data ss:Type="Number">21</Data></Cell>
<Cell><Data ss:Type="Number">22</Data></Cell>

<Cell><Data ss:Type="Number">23</Data></Cell>

<Cell><Data ss:Type="Number">24</Data></Cell>
<Cell><Data ss:Type="Number">25</Data></Cell>

<Cell><Data ss:Type="Number">26</Data></Cell>

<Cell><Data ss:Type="Number">27</Data></Cell>
<Cell><Data ss:Type="Number">28</Data></Cell>

<Cell><Data ss:Type="Number">29</Data></Cell>

<Cell><Data ss:Type="Number">30</Data></Cell>
<Cell><Data ss:Type="Number">31</Data></Cell>

<Cell><Data ss:Type="Number">32</Data></Cell>

<Cell><Data ss:Type="Number">33</Data></Cell>
<Cell><Data ss:Type="Number">34</Data></Cell>

<Cell><Data ss:Type="Number">35</Data></Cell>

<Cell><Data ss:Type="Number">36</Data></Cell>
<Cell><Data ss:Type="Number">37</Data></Cell>

<Cell><Data ss:Type="Number">38</Data></Cell>

<Cell><Data ss:Type="Number">39</Data></Cell>
<Cell><Data ss:Type="Number">40</Data></Cell>

<Cell><Data ss:Type="Number">41</Data></Cell>

<Cell><Data ss:Type="Number">42</Data></Cell>
<Cell><Data ss:Type="Number">43</Data></Cell>

<Cell><Data ss:Type="Number">44</Data></Cell>

<Cell><Data ss:Type="Number">45</Data></Cell>
<Cell><Data ss:Type="Number">46</Data></Cell>

<Cell><Data ss:Type="Number">47</Data></Cell>

<Cell><Data ss:Type="Number">48</Data></Cell>
<Pane>

 <Number>3</Number>

 <ActiveRow>17</ActiveRow>

 <ActiveCol>4</ActiveCol>

</Pane>

False

False

False

False

False

False


Hi,

On Jun 27, 2006, at 9:06 PM, Andrew G. wrote:

** notice the top two elements: and
remove the namespace attribute, but nothing when I put it back in.

---- Actual Question —
Is this just a weird instance of XML which REXML doesn’t support or
do I
need to be doing some fancy XPath footwork to make it work?

XPath footwork. Write:

@doc.root.elements.each(‘ss:Worksheet’) {}

The prefix ‘ss’ is defined and equal to the default namespace of the
Worksheet element.

Cheers,
Bob

cheers and thanks! :slight_smile:

Andrew


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

BAM! :slight_smile: thanks, I never noticed that. Is that an XML standard thing? if
you have a default namespace, you define it twice? once with no prefix
and again with the prefix you’ll need to access it?

Bob H. wrote:

Hi,

On Jun 27, 2006, at 9:06 PM, Andrew G. wrote:

** notice the top two elements: and
remove the namespace attribute, but nothing when I put it back in.

---- Actual Question —
Is this just a weird instance of XML which REXML doesn’t support or
do I
need to be doing some fancy XPath footwork to make it work?

XPath footwork. Write:

@doc.root.elements.each(‘ss:Worksheet’) {}

The prefix ‘ss’ is defined and equal to the default namespace of the
Worksheet element.

Cheers,
Bob

cheers and thanks! :slight_smile:

Andrew


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

Hi,

[Sorry if this is a duplicate, something appears to have gone wrong
the first time I posted]

On Jun 27, 2006, at 9:06 PM, Andrew G. wrote:

** notice the top two elements: and
remove the namespace attribute, but nothing when I put it back in.

---- Actual Question —
Is this just a weird instance of XML which REXML doesn’t support or
do I
need to be doing some fancy XPath footwork to make it work?

XPath footwork. Write:

@doc.root.elements.each(‘ss:Worksheet’) {}

The prefix ‘ss’ is defined and equal to the default namespace of the
Worksheet element.

Cheers,
Bob

cheers and thanks! :slight_smile:

Andrew


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

On Jun 28, 2006, at 1:16 PM, Andrew G. wrote:

BAM! :slight_smile: thanks, I never noticed that. Is that an XML standard
thing? if
you have a default namespace, you define it twice? once with no prefix
and again with the prefix you’ll need to access it?

It is definitely not required by XML to do this. Though I never
thought much about it before I saw your example. Let’s assume that
XPath as implemented in both REXML and the Ruby libxml wrapper are
not broken in the same way, then you do need the prefix defined when
using XPath (unless there is some other way), and so I’d think an
apparently redundant ns definition would be a fairly common thing to
do since it isn’t really redundant. But I’m not an XPath expert
(don’t really use it at all actually).

Cheers,
Bob

Worksheet element.


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/