REXML exception rescuing


#1

Hi all,

I use the following to parse an XML file with REXML:

begin
doc = REXML::Document.new File.new(‘test_cfg.xml’)
rescue REXML::ParseException => msg
puts “Failed: #{msg}”
end

When REXML throws ParseException, a whole load information is spilled
into msg. Although this is useful for debugging, it hardly is an option
for a “real-life” application that just has to notify the user of an
error in his .xml file in the friendliest way possible.

How can I get just the “malformed XML: missing tag start” part (plus the
line in the file) without all the stack trace and the information that
comes after it, which for some reason is repeated twice. Is this
possible without deconstructing the raised msg ?

P.S. generally REXML’s error handling seems abysmal. Any simplest error
(forgetting a = after attribute name, forgetting an opening element)
throws an error for the last line in the .xml file

Thanks,
Eli


#2

On May 7, 2006, at 2:21 PM, Eli B. wrote:

When REXML throws ParseException, a whole load information is spilled

Posted via http://www.ruby-forum.com/.

puts “Failed: #{msg.message}”


#3

Logan C. wrote:

On May 7, 2006, at 2:21 PM, Eli B. wrote:

When REXML throws ParseException, a whole load information is spilled

Posted via http://www.ruby-forum.com/.

puts “Failed: #{msg.message}”

Logan,

This is the same as just #{msg}, try it.

Completely useless as an error to display to a user, for 2 reasons:

  1. Full of debugging information (inside-REXML backtrace)
  2. Inaccurate positioning, pointing on the last line of the document for
    practically all errors

#4

On Mon, 2006-05-08 at 03:21 +0900, Eli B. wrote:

When REXML throws ParseException, a whole load information is spilled
into msg. Although this is useful for debugging, it hardly is an option
for a “real-life” application that just has to notify the user of an
error in his .xml file in the friendliest way possible.

How can I get just the “malformed XML: missing tag start” part (plus the
line in the file) without all the stack trace and the information that
comes after it, which for some reason is repeated twice. Is this
possible without deconstructing the raised msg ?

This is probably a bit naive but maybe will be a start:

begin
doc = REXML::Document.new File.new(‘bad.xml’)
rescue REXML::ParseException => ex
puts “Failed: #{ex.message[/^.*$/]} (#{ex.message[/Line:\s\d+/]})”
end


#5

Ross B. wrote:

On Mon, 2006-05-08 at 03:21 +0900, Eli B. wrote:

When REXML throws ParseException, a whole load information is spilled
into msg. Although this is useful for debugging, it hardly is an option
for a “real-life” application that just has to notify the user of an
error in his .xml file in the friendliest way possible.

How can I get just the “malformed XML: missing tag start” part (plus the
line in the file) without all the stack trace and the information that
comes after it, which for some reason is repeated twice. Is this
possible without deconstructing the raised msg ?

This is probably a bit naive but maybe will be a start:

begin
doc = REXML::Document.new File.new(‘bad.xml’)
rescue REXML::ParseException => ex
puts “Failed: #{ex.message[/^.*$/]} (#{ex.message[/Line:\s\d+/]})”
end

Ross,

Thanks for your suggestion, it was clear to me that this is possible,
but I seek a more ‘sane’ way of doing it. What if REXML changes the
format of the message slightly in the next version - the regexes won’t
match any longer. I don’t see why there can’t be a short way to print a
message that is useful to an end user.

REXML is praised as the XML parsing library of Ruby and is part of the
standard library, surely someone noticed its useless error reporting ?!


#6

On May 7, 2006, at 5:42 PM, Eli B. wrote:


Posted via http://www.ruby-forum.com/.

Oops, I thought message was the message w/o the stack trace


#7

Eli B. removed_email_address@domain.invalid writes:

possible without deconstructing the raised msg ?

Thanks for your suggestion, it was clear to me that this is possible,
but I seek a more ‘sane’ way of doing it. What if REXML changes the
format of the message slightly in the next version - the regexes won’t
match any longer. I don’t see why there can’t be a short way to print a
message that is useful to an end user.

REXML is praised as the XML parsing library of Ruby and is part of the
standard library, surely someone noticed its useless error reporting ?!

Why didn’t you bother to just look in the code? :slight_smile:

[~/src/ruby-1.8.4/lib/rexml/parseexception.rb:]
module REXML
class ParseException < RuntimeError

def position
@source.current_line[0] if @source and defined?
@source.current_line and
@source.current_line
end

def line
  @source.current_line[2] if @source and defined? 

@source.current_line and
@source.current_line
end

def context
  @source.current_line
end

end
end


#8

Eli B. wrote:

Thanks for your suggestion, it was clear to me that this is possible,
but I seek a more ‘sane’ way of doing it. What if REXML changes the
format of the message slightly in the next version - the regexes won’t
match any longer. I don’t see why there can’t be a short way to print a
message that is useful to an end user.

Since I found no better way around it, and REXML is pretty much the only
powerful pure-Ruby XML lib out there, I had to turn to the ugly side,
handcrafting the sanest explanation possible out of REXML parse errors:

module REXML
class ParseException < RuntimeError

# Try to make the sanest possible explanation of a parse error,
# suitable for display to users
#
def explain
  str = self.to_s

  # If it's the broken 'missing tag start' error,
  # output the "unconsumed chars" as they serve as
  # a good hint to the location of the problem
  #
  if str.match(/missing tag start/)
    if str.match(/Last \d+ unconsumed characters:\n([^\n]+)\n/)
      "near #{$1}"
    else
      "(unknown)"
    end
  # Otherwise it's a normal error, so return the description
  #
  else
    if str.match(/REXML::ParseException: (.*)$/)
      "#{$1}     (Line #{self.line})"
    else
      "(unknown)"
    end
  end
end

end
end

Tell me what you think. Could have I extracted more information ? My
knowledge of REXML is still basic so I’m surely missing some quirks.


#9

Christian N. wrote:

Eli B. removed_email_address@domain.invalid writes:

possible without deconstructing the raised msg ?

Thanks for your suggestion, it was clear to me that this is possible,
but I seek a more ‘sane’ way of doing it. What if REXML changes the
format of the message slightly in the next version - the regexes won’t
match any longer. I don’t see why there can’t be a short way to print a
message that is useful to an end user.

REXML is praised as the XML parsing library of Ruby and is part of the
standard library, surely someone noticed its useless error reporting ?!

Why didn’t you bother to just look in the code? :slight_smile:

[~/src/ruby-1.8.4/lib/rexml/parseexception.rb:]
module REXML
class ParseException < RuntimeError

def position
@source.current_line[0] if @source and defined?
@source.current_line and
@source.current_line
end

def line
  @source.current_line[2] if @source and defined? 

@source.current_line and
@source.current_line
end

def context
  @source.current_line
end

end
end

I did look at the code before posting, and it doesn’t help. Position /
line can be printed separately, true (although they are completely
useless most of the times), but the message can not be printed
separately without the stack trace, for some truly obscure reason.

See the ‘to_s’ method of ParseException