Forum: Ruby REXML exception rescuing

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Eli B. (Guest)
on 2006-05-07 22:20
Hi all,

I use the following to parse an XML file with REXML:

begin
	doc = REXML::Document.new File.new('test_cfg.xml')
rescue REXML::ParseException => msg
	puts "Failed: #{msg}"
end

When REXML throws ParseException, a whole load information is spilled
into msg. Although this is useful for debugging, it hardly is an option
for a "real-life" application that just has to notify the user of an
error in his .xml file in the friendliest way possible.

How can I get just the "malformed XML: missing tag start" part (plus the
line in the file) without all the stack trace and the information that
comes after it, which for some reason is repeated twice. Is this
possible without deconstructing the raised msg ?

P.S. generally REXML's error handling seems abysmal. Any simplest error
(forgetting a = after attribute name, forgetting an opening element)
throws an error for the last line in the .xml file


Thanks,
Eli
Logan C. (Guest)
on 2006-05-07 22:32
(Received via mailing list)
On May 7, 2006, at 2:21 PM, Eli B. wrote:

> When REXML throws ParseException, a whole load information is spilled
>
> Posted via http://www.ruby-forum.com/.
>

puts "Failed: #{msg.message}"
Ross B. (Guest)
on 2006-05-07 22:57
(Received via mailing list)
On Mon, 2006-05-08 at 03:21 +0900, Eli B. wrote:
> When REXML throws ParseException, a whole load information is spilled
> into msg. Although this is useful for debugging, it hardly is an option
> for a "real-life" application that just has to notify the user of an
> error in his .xml file in the friendliest way possible.
>
> How can I get just the "malformed XML: missing tag start" part (plus the
> line in the file) without all the stack trace and the information that
> comes after it, which for some reason is repeated twice. Is this
> possible without deconstructing the raised msg ?

This is probably a bit naive but maybe will be a start:

begin
  doc = REXML::Document.new File.new('bad.xml')
rescue REXML::ParseException => ex
  puts "Failed: #{ex.message[/^.*$/]} (#{ex.message[/Line:\s\d+/]})"
end
Eli B. (Guest)
on 2006-05-08 01:41
Logan C. wrote:
> On May 7, 2006, at 2:21 PM, Eli B. wrote:
>
>> When REXML throws ParseException, a whole load information is spilled
>>
>> Posted via http://www.ruby-forum.com/.
>>
>
> puts "Failed: #{msg.message}"

Logan,

This is the same as just #{msg}, try it.

Completely useless as an error to display to a user, for 2 reasons:

1) Full of debugging information (inside-REXML backtrace)
2) Inaccurate positioning, pointing on the last line of the document for
practically all errors
Eli B. (Guest)
on 2006-05-08 01:45
Ross B. wrote:
> On Mon, 2006-05-08 at 03:21 +0900, Eli B. wrote:
>> When REXML throws ParseException, a whole load information is spilled
>> into msg. Although this is useful for debugging, it hardly is an option
>> for a "real-life" application that just has to notify the user of an
>> error in his .xml file in the friendliest way possible.
>>
>> How can I get just the "malformed XML: missing tag start" part (plus the
>> line in the file) without all the stack trace and the information that
>> comes after it, which for some reason is repeated twice. Is this
>> possible without deconstructing the raised msg ?
>
> This is probably a bit naive but maybe will be a start:
>
> begin
>   doc = REXML::Document.new File.new('bad.xml')
> rescue REXML::ParseException => ex
>   puts "Failed: #{ex.message[/^.*$/]} (#{ex.message[/Line:\s\d+/]})"
> end

Ross,

Thanks for your suggestion, it was clear to me that this is possible,
but I seek a more 'sane' way of doing it. What if REXML changes the
format of the message slightly in the next version - the regexes won't
match any longer. I don't see why there can't be a short way to print a
message that is useful to an end user.


REXML is praised as *the* XML parsing library of Ruby and is part of the
standard library, surely someone noticed its useless error reporting ?!
Logan C. (Guest)
on 2006-05-08 02:41
(Received via mailing list)
On May 7, 2006, at 5:42 PM, Eli B. wrote:

>
>
>
> --
> Posted via http://www.ruby-forum.com/.
>

Oops, I thought message was the message w/o the stack trace
Christian N. (Guest)
on 2006-05-08 18:00
(Received via mailing list)
Eli B. <removed_email_address@domain.invalid> writes:

>>> possible without deconstructing the raised msg ?
>
> Thanks for your suggestion, it was clear to me that this is possible,
> but I seek a more 'sane' way of doing it. What if REXML changes the
> format of the message slightly in the next version - the regexes won't
> match any longer. I don't see why there can't be a short way to print a
> message that is useful to an end user.
>
>
> REXML is praised as *the* XML parsing library of Ruby and is part of the
> standard library, surely someone noticed its useless error reporting ?!

Why didn't you bother to just look in the code? :-)

[~/src/ruby-1.8.4/lib/rexml/parseexception.rb:]
module REXML
  class ParseException < RuntimeError
    ...
    def position
      @source.current_line[0] if @source and defined?
@source.current_line and
      @source.current_line
    end

    def line
      @source.current_line[2] if @source and defined?
@source.current_line and
      @source.current_line
    end

    def context
      @source.current_line
    end
  end
end
Eli B. (Guest)
on 2006-05-08 19:24
Christian N. wrote:
> Eli B. <removed_email_address@domain.invalid> writes:
>
>>>> possible without deconstructing the raised msg ?
>>
>> Thanks for your suggestion, it was clear to me that this is possible,
>> but I seek a more 'sane' way of doing it. What if REXML changes the
>> format of the message slightly in the next version - the regexes won't
>> match any longer. I don't see why there can't be a short way to print a
>> message that is useful to an end user.
>>
>>
>> REXML is praised as *the* XML parsing library of Ruby and is part of the
>> standard library, surely someone noticed its useless error reporting ?!
>
> Why didn't you bother to just look in the code? :-)
>
> [~/src/ruby-1.8.4/lib/rexml/parseexception.rb:]
> module REXML
>   class ParseException < RuntimeError
>     ...
>     def position
>       @source.current_line[0] if @source and defined?
> @source.current_line and
>       @source.current_line
>     end
>
>     def line
>       @source.current_line[2] if @source and defined?
> @source.current_line and
>       @source.current_line
>     end
>
>     def context
>       @source.current_line
>     end
>   end
> end

I did look at the code before posting, and it doesn't help. Position /
line can be printed separately, true (although they are completely
useless most of the times), but the message can not be printed
separately without the stack trace, for some truly obscure reason.

See the 'to_s' method of ParseException
Eli B. (Guest)
on 2006-05-09 00:04
Eli B. wrote:
>>> Thanks for your suggestion, it was clear to me that this is possible,
>>> but I seek a more 'sane' way of doing it. What if REXML changes the
>>> format of the message slightly in the next version - the regexes won't
>>> match any longer. I don't see why there can't be a short way to print a
>>> message that is useful to an end user.

Since I found no better way around it, and REXML is pretty much the only
powerful pure-Ruby XML lib out there, I had to turn to the ugly side,
handcrafting the sanest explanation possible out of REXML parse errors:


module REXML
  class ParseException < RuntimeError

    # Try to make the sanest possible explanation of a parse error,
    # suitable for display to users
    #
    def explain
      str = self.to_s

      # If it's the broken 'missing tag start' error,
      # output the "unconsumed chars" as they serve as
      # a good hint to the location of the problem
      #
      if str.match(/missing tag start/)
        if str.match(/Last \d+ unconsumed characters:\n([^\n]+)\n/)
          "near #{$1}"
        else
          "(unknown)"
        end
      # Otherwise it's a normal error, so return the description
      #
      else
        if str.match(/REXML::ParseException: (.*)$/)
          "#{$1}     (Line #{self.line})"
        else
          "(unknown)"
        end
      end
    end
  end
end

Tell me what you think. Could have I extracted more information ? My
knowledge of REXML is still basic so I'm surely missing some quirks.
This topic is locked and can not be replied to.