Forum: Ruby Hpricot 0.6 -- the swift, delightful HTML parser

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
19fdf8bd123216b5056fb856cf1a5771?d=identicon&s=25 _why (Guest)
on 2007-06-16 02:02
(Received via mailing list)
Hpricot 0.6 is up on Rubyforge.  Or get it from the development gem

  gem install hpricot --source

Hpricot is a flexible HTML parser written in C.  Nestled in a nice
Ruby wrapper.  But Hpricot takes a lot of extra steps to help you

Hpricot is great for both scraping web sites and altering HTML
safely.  With plenty of options for either cleaning up HTML or
leaving unmodified areas untouched.




Loading an HTML page:

  require 'open-uri'
  require 'hpricot'

  doc = Hpricot(open(""))

Fixing HTML into XHTML:

  doc = Hpricot(open(""), :fixup_tags => true)

Placing a number next to each link on a page, preserving the
original HTML as much as possible:

  doc = Hpricot(open(""))
  num = 0
  (doc/"a").append do
    strong " [#{num += 1}]"
  puts doc.to_original_html

(Notice how you can use a simple Ruby syntax for adding HTML tags
inside the block attached to the `append` method!)


  * Hpricot for JRuby -- nice work Ola Bini!
  * Inline Markaby for Hpricot documents.
  * XML tags and attributes are no longer downcased like HTML is.
  * new syntax for grabbing everything between two elements using a
Range in the search method: (doc/("font".."font/br")) or in nodes_at
like so: (doc/"font").nodes_at("*".."br"). Only works with either a pair
of siblings or a set of a parent and a sibling.
  * Ignore self-closing endings on tags (such as form) which are
containers. Treat them like open parent tags. Reported by Jonathan
Nichols on the hpricot list.
  * Escaping of attributes, yanked from Jim Weirich and Sam Ruby's work
in Builder.
  * Element#raw_attributes gives unescaped data.  Element#attributes
gives escaped.
  * Added: Elements#attr, Elements#remove_attr, Elements#remove_class.
  * Added: Traverse#preceding, Traverse#following, Traverse#previous,

Okay, good enough,

This topic is locked and can not be replied to.