Rexml - StreamListener - Where I am in the XML?

Hi,

I’m using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:

— code start here —

require ‘rexml/document’
require ‘rexml/streamlistener’

class MyListener
include REXML::StreamListener
def tag_start(name, attrs)
# anything to do …
end
def text(text)
# anything to do …
end
end

REXML::Document.parse_stream( File.open( xmlfile), MyListener.new)

— code ends here —

At the “tag_start” method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

Regards

Michael

[email protected] wrote:

class MyListener

— code ends here —

At the “tag_start” method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

In general, the big value of a stream parser is that it is not holding
onto much state, so memory needs stay small regardless of the size of
the XML. State tracking is left to the application developer.

The REXML pull-parser lets you peek at the next event; not sure offhand
if it goes the other way. But I suspect that with the stream and pull
parsers (one of which sits on the other under the hood, so they are more
or less the same), once an event is off the stack, it is gone.

Stream parsing works really well when you have a large source of
regularly structured data (e.g., XML dump of a database table), such
that you can grab and stash in memory just what you need, work with it
(perhaps as a transient DOM), then discard it and move on.


James B.

“Trying to port the desktop metaphor to the Web is like working
on how to fuel your car with hay because that is what horses eat.”
- Dare Obasanjo

On 2/21/07, [email protected] [email protected] wrote:

I’m using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:

At the “tag_start” method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

Hi,

I’ve used REXML::Parsers::PullParser instead of stream parsing (same
general idea), here’s an example of a function that waits until it
sees a tag that matches element_name and then pulls the text from it:

def self.get_element_text(filename, element_name)
  parser = REXML::Parsers::PullParser.new(File.new(filename))
  text = false
  while parser.has_next?
    el = parser.pull
    if el.start_element? and el[0] == element_name
      text = parser.peek[0]
      break
    end
  end
  return text
end

So, while the above certainly won’t work for your application, you
could try playing a little with parser.peek to see if you can find the
child node (or next node, whatever) that you’re looking for.

HTH,
Keith

@James: I know a SAX parser in another language that has to use like
the REXML::StreamListener. You have additional the information that -
for example - a FirstName-Tag is a member of a User-Tag and so on.
You’re right, if I use REXML::StreamListener, I can track the stack
for my self. But there was a chance that I have oversight the right
function in REXML::StreamListener only :slight_smile:

@Keith: I will check the REXML::Parsers::PullParser. Thanks for the
info :slight_smile:

Michael