XML parser with file names and line numbers


#1

Howdy,

Is there a Ruby XML parser that includes the file name and line number
for
elements?

Thanks,

David


#2

2006/4/21, David P. removed_email_address@domain.invalid:

Howdy,

Is there a Ruby XML parser that includes the file name and line number for
elements?

What exactly do you mean by that? AFAIK there is no place to store
this info in DOM so…

robert


#3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert K. wrote:

2006/4/21, David P. removed_email_address@domain.invalid:

Howdy,

Is there a Ruby XML parser that includes the file name and line number for
elements?

What exactly do you mean by that? AFAIK there is no place to store
this info in DOM so…

It seems like this would be possible with a SAXParser when you’re
scanning the document to be able to grab what lineno an element
is on when it starts an element.

Zach
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFESM+EMyx0fW1d8G0RAk7tAJ4m+U76yd9Mrb3XQYR+lQ8HqFaHpwCeNmYp
m1EsIM36/YOm5JHD6Ke1f9E=
=KWbP
-----END PGP SIGNATURE-----


#4

It is possible with a SAX parser in Java, but the SAX parser in rexml
does
not include file/line information in the “PullEvent” as far as I can
tell.

I guess rexml is the only XML parser currently under development for
ruby.


#5

David P. wrote:

It is possible with a SAX parser in Java, but the SAX parser in rexml does
not include file/line information in the “PullEvent” as far as I can tell.

I guess rexml is the only XML parser currently under development for ruby.

No, ruby-libxml is in development. I know because I am an active
“talker” on the ruby-libxml mailing list.

http://rubyforge.org/projects/libxml/

I switched from REXML to libxml because libxml is blazing fast.

Zach


#6

Zach,

Yep… libxml does the trick with:
XML::Parser.default_line_numbers = true

element.line_number
element.doc.filename

Thanks,

David


#7

On Fri, 2006-04-21 at 00:01, David P. wrote:

Howdy,

Is there a Ruby XML parser that includes the file name and line number for
elements?

XML does not have the concept of a line. XML deals only with describing
data and structure, not formatting. Carriage returns and line feeds are
considered to be whitespace.

That is why true XML editors use separate style sheets, like CSS,
XSL-FO, or FOSI, to format XML documents.

If you have an XML document, process it in some way, for example just by
parsing it and saving it, any carriage returns and line feeds may have
been removed. The parser may even add new ones. Whitespace is guaranteed
to be preserved in CDATA sections only.

You might find it more useful to count the elements themselves. That way
the numbers won’t change just because you open a file in an editor and
look at it.

Elements do not have file names either.

What is it you want to do?

/Henrik


http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://tocsim.rubyforge.com/ - Process simulation
http://testunitxml.rubyforge.org/ - XML test framework
http://declan.rubyforge.org/ - Declarative XML processing


#8

Henrik,

I’ve already got a solution to the issue. libxml-ruby has the
functionality
I need.

One of my projects is SiteMap ( http://rubyforge.org/projects/sitemap )
which is a Domain Specific Language that descripts web site navigation,
access control, link names, etc. SiteMap allows the designer to imbed
Ruby
code (e.g., to test access control, etc.) It would be nice to have the
file/line of the generated methods in stack traces, etc.

Thanks,

David


#9

On Apr 21, 2006, at 11:21 AM, David P. wrote:

It is possible with a SAX parser in Java, but the SAX parser in
rexml does
not include file/line information in the “PullEvent” as far as I
can tell.

I guess rexml is the only XML parser currently under development
for ruby.

Well, there is xampl-pp that I wrote. It doesn’t change often, that’s
true, but I use it a lot. The most recent version is bundled with
xampl (see my signature for where). Xampl-pp is a pull parser. And it
keeps track of line and column as best it can (and yes, it can get
the column wrong, especially if UTF is involved (it is sometimes more
of a byte count than a character count), but the line count is
normally pretty good). The instance variable @input in the parser is
a bit of a funny thing, but if it is a file, then you can use the
File methods (e.g. path) and that’ll work – the trouble is that
@input isn’t exposed, so… There are two ways in which the pull
parser can be used: as an object that parses and manipulated by
calling methods on it, or, alternatively, by extending the parser
with actions. I use both techniques, sometimes at the same time –
this can be interesting. Anyway, you can either re-open Xampl_PP and
define an accessor for @input, or you can extend and it is right
there for you. I’ve never exposed a reader to @input because mucking
with it would be a very very bad idea.

Have you looked at the libxml wrapper? It probably provides that
information.

Cheers,
Bob

scanning
=KWbP
-----END PGP SIGNATURE-----


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/