Memory considerations when parsing XML file

Hi,

I need to quickly parse a large XML file. I didn’t want to use DOM
parsing since this way you end up with the whole file in the memory. I
looked at writing my own SAX-style parser and came across this post:
http://www.janvereecken.com/2007/4/11/event-driven-xml-parser-in-ruby.

My question is about reading the file from the hard drive into the
parse_stream method:
REXML::Document.parse_stream(File.open(filename).read, MyListener.new)
Won’t File.open…read read the whole file in the memory? If yes, then
nothing was gained. I might just as well read and parse the document
using DOM since it is easier.

Thanks,
Tiberiu

I’m sorry for rushing with the post. I just read that parse_stream
also takes an IO object, so I don’t have to do read on the filename.

Tiberiu

On 31.01.2008 20:56, Mr_Tibs wrote:

I’m sorry for rushing with the post. I just read that parse_stream
also takes an IO object, so I don’t have to do read on the filename.

:slight_smile:

And the idiom should rather read

File.open(filename, ‘rb’) do |io|
REXML::Document.parse_stream(io, MyListener.new)
end

i.e. use the block form of File.open.

robert