Nokogiri 1.0.6 Released

nokogiri version 1.0.6 has been released!

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser.

Changes:

1.0.6

  • 5 Bugfixes

    • XPath Parser raises a SyntaxError on parse failure
    • CSS Parser raises a SyntaxError on parse failure
    • filter() and not() hpricot compatibility added
    • CSS searches via Node#search are now always relative
    • CSS to XPath conversion is now cached

FEATURES:

  • XPath support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • Drop in replacement for Hpricot (though not bug for bug)

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

Nokogiri also features an Hpricot compatibility layer to help ease the
change
to using correct CSS and XPath.

SUPPORT:

The Nokogiri mailing list is available here:

The bug tracker is available here:

SYNOPSIS:

require ‘nokogiri’
require ‘open-uri’

doc =
Nokogiri::HTML(open(‘tenderlove - Google Search’))

Search for nodes by css

doc.css(‘h3.r a.l’).each do |link|
puts link.content
end

Search for nodes by xpath

doc.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Or mix and match.

doc.search(‘h3.r a.l’, ‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Can the Reader interface do stream parsing à la StaX? I couldn’t tell
from the docs.

thanks,
– Mark.

On Tue, Nov 18, 2008 at 01:16:51AM +0900, Mark T. wrote:

Can the Reader interface do stream parsing à la StaX? I couldn’t tell
from the docs.

Not yet. The normal doc parser will do streams right now. SAX/Reader
stream parsing is next on my list. :wink: