Tidy segfault on Linux

Hi–

I’m using the Ruby tidy gem to clean some user-input HTML. It works
splendidly on my Mac development machine, but seg faults on a CentOS
linux box.

I’ve tracked through the code, and the crash occurs in Tidybuf.rb’s
to_s function. The “struct.bp” method returns a non-nil value (that
indicates a zero size), but the struct.size is some huge number which
varies run-to-run.

I’ve googled a ton, and there are a lot of people who have hit
segfaults using Ruby and tidy. Some of the issue seem to have been a
namespace conflict between Graphics/ImageMagick and Tidy, but we’ve
fixed that (by renaming tidy’s GetToken function and recompiling), and
are still hitting a seg fault.

More detail:

Using a fresh Rails 1.2.5 app, I’ve stepped in console thru the parts
of Tidyobj.rb’s clean method, like so:

require ‘tidy’
tidy = Tidyobj.new
@doc = Tidylib.create
@outbuf = Tidybuf.new
str = ‘hi there!’
rc = -1
rc = Tidylib.parse_string(@doc, str)
rc = Tidylib.clean_and_repair(@doc) if rc >= 0
rc = (Tidylib.opt_parse_value(@doc, :force_output, true) == 1 ? rc :
-1) if rc > 1
rc = Tidylib.save_buffer(@doc, @outbuf.struct) if rc >= 0

At this point:

@outbuf.struct.size
=> 154846656

@outbuf.struct.bp
=> #<DL::PtrData:0x0x949aa38 ptr=0x0x29c4d0 size=0 free=0x(nil)>

Then:

@outbuf.to_s
/usr/lib/ruby/site_ruby/1.8/tidy/tidybuf.rb:39: [BUG] Segmentation
fault
ruby 1.8.4 (2005-12-24) [i386-linux]
Aborted (core dumped)

The shorter way to reproduce this is:

tidy=Tidyobj.new
tidy.clean ‘hi’
/usr/lib/ruby/site_ruby/1.8/tidy/tidybuf.rb:39: [BUG] Segmentation
fault
ruby 1.8.4 (2005-12-24) [i386-linux]
Aborted (core dumped)

If anyone has a clue, please help!

Thanks,
Lee

Lee,

Check out this URL for a patch:

http://rubyforge.org/tracker/index.php?func=detail&aid=10007&group_id=435&atid=1744

It seems to have fixed the problem for me.

Good luck,

Scott

Excellent, thanks, I’ll try it out.

Lee

Hi Lee & Scott,

we ran into the excat same issue Lee described in
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/282246

We tried the patch as suggested by Scott in
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/282454

Unfortunately, no luck… we still segfault.

Just wondering if you were able to solve it… and if so, how?

We have Ruby 1.8.5, Tidy Gem 1.1.2, libtidy 0.99 on CentOS 5.2 (x86_64).

One thing we found is that Tidy seems to only segfault if one feeds it
valid HTML. If one feeds it bad HTML, it doesn’t crash (see example
below).

Thanks!

Bob

—8<-------------------------------------------------------------------------

require “rubygems”
require “tidy”

Tidy.path = “/usr/lib64/libtidy.so”

Tidy.open do |t|
puts “*** BAD SAMPLE”
t.clean “I am bad HTML!”
puts t.errors
puts t.diagnostics
end

Tidy.open do |t|
puts “*** GOOD SAMPLE”
t.clean ‘’ +
‘foo

bar


puts t.errors
puts t.diagnostics
end

—8<-------------------------------------------------------------------------

Outputs the following:

*** BAD SAMPLE
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 7 - Warning: plain text isn’t allowed in elements
line 1 column 7 - Info: previously mentioned
line 1 column 7 - Warning: inserting implicit
line 1 column 7 - Warning: inserting missing ‘title’ element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!

*** GOOD SAMPLE
(eval):5: [BUG] Segmentation fault
ruby 1.8.5 (2006-08-25) [x86_64-linux]

Aborted

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs