Nokogiri 1.1.1 Released


#1

nokogiri version 1.1.1 has been released!

Nokogiri (��) is an HTML, XML, SAX, and Reader parser.

Changes:

1.1.1

  • New features

    • Added XML::Node#elem?
    • Added XML::Node#attribute_nodes
    • Added XML::Attr
    • XML::Node#delete added.
    • XML::NodeSet#inner_html added.
  • Bugfixes

    • Not including an HTML entity for \r for HTML nodes.
    • Removed CSS::SelectorHandler and XML::XPathHandler
    • XML::Node#attributes returns an Attr node for the value.
    • XML::NodeSet implements to_xml

FEATURES:

  • XPath support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • Drop in replacement for Hpricot (though not bug for bug)

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

Nokogiri also features an Hpricot compatibility layer to help ease the
change
to using correct CSS and XPath.

SUPPORT:

The Nokogiri mailing list is available here:

The bug tracker is available here:

SYNOPSIS:

require ‘nokogiri’
require ‘open-uri’

doc =
Nokogiri::HTML(open(‘http://www.google.com/search?q=tenderlove’))

Search for nodes by css

doc.css(‘h3.r a.l’).each do |link|
puts link.content
end

Search for nodes by xpath

doc.xpath(’//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Or mix and match.

doc.search(‘h3.r a.l’, ‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

INSTALL:


#2

Thanks Aaron for your work on Nokogiri.

I noticed what looked like JRuby support so I tried installing the gem
(worked) and then an example that failed:

irb(main):001:0> require ‘nokogiri’
=> true
irb(main):002:0> require ‘open-uri’
=> true
irb(main):003:0> doc = Nokogiri::HTML(open(‘http://markwatson.com’))
NoMethodError: undefined method read_memory' for Nokogiri::HTML::Document:Class from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1- java/lib/nokogiri/html.rb:36:inparse’
from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:15:in HTML' from (irb):4 from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:ineval_input’

I am using version jruby 1.1.5 - will a later version of JRuby make
this work?

Thanks,
Mark


#3

On Thu, Jan 15, 2009 at 05:44:05AM +0900, Mark W. wrote:

NoMethodError: undefined method read_memory' for Nokogiri::HTML::Document:Class from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1- java/lib/nokogiri/html.rb:36:inparse’
from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:15:in HTML' from (irb):4 from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:ineval_input’

I am using version jruby 1.1.5 - will a later version of JRuby make
this work?

No. Unfortunately the jruby release is sort of a lie… It doesn’t
actually work on jruby. I’ve been releasing a jruby version so that
webrat can use it’s CSS to XPath conversion code, then fall back on
REXML.

We’re working on a better jruby solution though. We’ve got a branch
that uses FFI, and Charles Nutter has a branch with a Java
implementation.

http://github.com/headius/nokogiri/tree/master

I’m not sure what the status is on his branch.


#4

On Jan 14, 2:00 pm, Aaron P. removed_email_address@domain.invalid
wrote:

irb(main):003:0> doc = Nokogiri::HTML(open(‘http://markwatson.com’))
this work?
http://github.com/headius/nokogiri/tree/master

I’m not sure what the status is on his branch.


Aaron P.http://tenderlovemaking.com/

Thanks Aaron for the update on Charles’ and your branches. I am using
nokogiri in 2 examples in a new Ruby book that I am writing for
APress; I had a warning about JRuby incompatibility (and have a little
code using a pure Ruby alternative), but by the time the book is
published (5 months) it looks like we will have a working version for
JRuby. I could be of more help with the pure Java version, so I will
as Charles if he wants help.