Hpricot 0.5 -- a fast, forgiving HTML reader

Hi, here’s Hpricot 0.5.

gem install hpricot --source http://code.whytheluckystiff.net

Hpricot reads HTML pages and works hard to fix them up and give you
everything you need to wind your way around them and hack them up!
Inspired by John Resig’s JQuery and Tanaka A.'s HTree.

  • Hpricot is standalone. It’s dependant on no other libs, just
    Ruby.
  • Hpricot is fast, its parser is written in C with help of the
    wonderful Ragel state machine compiler.
  • However, Hpricot also works hard to fix up HTML and pays a small
    penalty to get it right.
  • How hard does Hpricot work? My rule is: if Firefox parses it,
    Hpricot should too.

This release has a number of really nice features. The new
to_original_html method will try to preserve as much of the
original HTML as possible (including its mistakes) while still
merging in your changes. Also, you can test text nodes with syntax
like: //a[text()='Click Me!'].

Should appear on Rubyforge soon enough. Thank you to all the
ticketeers and patchistadores out there, especially Leslie Wu who’s
been punching that commit button like she’s doin the turtle trap!!

_why

On Jan 31, 10:33 pm, _why [email protected] wrote:

  • Hpricot is fast, its parser is written in C with help of the
    like: //a[text()='Click Me!'].

Should appear on Rubyforge soon enough. Thank you to all the
ticketeers and patchistadores out there, especially Leslie Wu who’s
been punching that commit button like she’s doin the turtle trap!!

_why

just downloaded hpricot … no warnings when I run my script.

however, the Rdoc flag is still turned off … documentation please!!!

C:…\Owner>gem query -n hpricot -r -s http://
code.whytheluckystiff.net"

*** REMOTE GEMS ***
Need to update 2 gems from http://code.whytheluckystiff.net

complete

hpricot (0.5, 0.4.99, 0.4.92, 0.4.90, 0.4.86, 0.4.76, 0.4.59, 0.4.52,
0.4.47, 0.4.43,
0.4, 0.3.32, 0.3, 0.2, 0.1)
a swift, liberal HTML parser with a fantastic library

C:…\Owner>gem install hpricot --source http://
code.whytheluckystiff.net
Select which gem to install for your platform (i386-mswin32)

  1. hpricot 0.5 (ruby)
  2. hpricot 0.5 (mswin32)
  3. hpricot 0.5 (ruby)
  4. hpricot 0.5 (mswin32)
  5. Skip this gem
  6. Cancel installation

2
Successfully installed hpricot-0.5-mswin32

C:\Documents and Settings\Owner>

RubyGems Documentation Index

hpricot 0.4 [rdoc] [www]
a swift, liberal HTML parser with a fantastic library

hpricot 0.4.99 [rdoc] [www]
a swift, liberal HTML parser with a fantastic library

hpricot 0.5 [rdoc] [www]
a swift, liberal HTML parser with a fantastic library

in all of the listing, only “www” has an active link.

On Fri, Feb 02, 2007 at 01:40:06AM +0900, bbiker wrote:

however, the Rdoc flag is still turned off … documentation please!!!
[…]
Successfully installed hpricot-0.5-mswin32

Oh, I see. This was for the windows one. Well, in the meantime you
can also use: http://code.whytheluckystiff.net/doc/hpricot/. Or
I’ve updated the gem on my personal repository.

_why

On Feb 1, 11:51 am, _why [email protected] wrote:

On Fri, Feb 02, 2007 at 01:40:06AM +0900, bbiker wrote:

however, the Rdoc flag is still turned off … documentation please!!!
[…]
Successfully installed hpricot-0.5-mswin32

Oh, I see. This was for the windows one. Well, in the meantime you
can also use:http://code.whytheluckystiff.net/doc/hpricot/. Or

Thanks for the link to the rdoc documentation.

I’ve updated the gem on my personal repository.
http://code.whytheluckystiff.net … isn’t this your personal
repository?

just uninstalled and reinstalled Hpricot, still no Rdoc

I don’t mean to be a pain in the a.

Thanks for your patience

Bernard K.

Thanks for allow us to change the buffer size. I had stopped using it
cause
it kept crashing on me, last night I updated and now it works great :smiley:

This script:

%W[rubygems open-uri hpricot].each{|x| require x}
Hpricot.parse(open(“http://www.opentable.com/rest_profile.aspx?rid=3292”).read)

crashes hpricot for me with this error:

=================
/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.5/lib/hpricot/parse.rb:44:in
scan': ran out of buffer space on element <input>, starting on line 23. (Hpricot::ParseError) from /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.5/lib/hpricot/parse.rb:44:inmake’
from
/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.5/lib/hpricot/parse.rb:15:in
parse' from (irb):3:inirb_binding’
from /usr/local/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding’
from /usr/local/lib/ruby/1.8/irb/workspace.rb:52

It’s probably because there’s some insanely large attribute on one of
the HTML elements.

============

Could hpricot die more gracefully and still parse the document
leaving only that element invalid when it sees such very large
attributes?

On Fri, May 11, 2007 at 09:33:07AM +0900, Ron M wrote:

/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.5/lib/hpricot/parse.rb:44:in `scan’: ran out of buffer space on element , starting on line 23. (Hpricot::ParseError)

Oh, you can increase the buffer size with: Hpricot.buffer_size = 262144

_why

Ron M wrote:

Could hpricot die more gracefully and still parse the document
leaving only that element invalid when it sees such very large
attributes?

What did why say when you posted this problem to the hpricot mailing
list?

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs