nokogiri version 1.1.1 has been released!
Nokogiri (��) is an HTML, XML, SAX, and Reader parser.
Changes:
1.1.1
-
New features
- Added XML::Node#elem?
- Added XML::Node#attribute_nodes
- Added XML::Attr
- XML::Node#delete added.
- XML::NodeSet#inner_html added.
-
Bugfixes
- Not including an HTML entity for \r for HTML nodes.
- Removed CSS::SelectorHandler and XML::XPathHandler
- XML::Node#attributes returns an Attr node for the value.
- XML::NodeSet implements to_xml
FEATURES:
- XPath support for document searching
- CSS3 selector support for document searching
- XML/HTML builder
- Drop in replacement for Hpricot (though not bug for bug)
Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.
Here is a speed test:
Nokogiri also features an Hpricot compatibility layer to help ease the
change
to using correct CSS and XPath.
SUPPORT:
The Nokogiri mailing list is available here:
The bug tracker is available here:
SYNOPSIS:
require ‘nokogiri’
require ‘open-uri’
doc =
Nokogiri::HTML(open(‘http://www.google.com/search?q=tenderlove’))
Search for nodes by css
doc.css(‘h3.r a.l’).each do |link|
puts link.content
end
Search for nodes by xpath
doc.xpath(’//h3/a[@class=“l”]’).each do |link|
puts link.content
end
Or mix and match.
doc.search(‘h3.r a.l’, ‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end
INSTALL:
Thanks Aaron for your work on Nokogiri.
I noticed what looked like JRuby support so I tried installing the gem
(worked) and then an example that failed:
irb(main):001:0> require ‘nokogiri’
=> true
irb(main):002:0> require ‘open-uri’
=> true
irb(main):003:0> doc = Nokogiri::HTML(open(‘http://markwatson.com’))
NoMethodError: undefined method read_memory' for Nokogiri::HTML::Document:Class from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1- java/lib/nokogiri/html.rb:36:in
parse’
from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:15:in HTML' from (irb):4 from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:in
eval_input’
I am using version jruby 1.1.5 - will a later version of JRuby make
this work?
Thanks,
Mark
On Thu, Jan 15, 2009 at 05:44:05AM +0900, Mark W. wrote:
NoMethodError: undefined method read_memory' for Nokogiri::HTML::Document:Class from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1- java/lib/nokogiri/html.rb:36:in
parse’
from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:15:in HTML' from (irb):4 from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:in
eval_input’
I am using version jruby 1.1.5 - will a later version of JRuby make
this work?
No. Unfortunately the jruby release is sort of a lie… It doesn’t
actually work on jruby. I’ve been releasing a jruby version so that
webrat can use it’s CSS to XPath conversion code, then fall back on
REXML.
We’re working on a better jruby solution though. We’ve got a branch
that uses FFI, and Charles Nutter has a branch with a Java
implementation.
http://github.com/headius/nokogiri/tree/master
I’m not sure what the status is on his branch.
On Jan 14, 2:00 pm, Aaron P. [email protected]
wrote:
irb(main):003:0> doc = Nokogiri::HTML(open(‘http://markwatson.com’))
this work?
http://github.com/headius/nokogiri/tree/master
I’m not sure what the status is on his branch.
–
Aaron P.http://tenderlovemaking.com/
Thanks Aaron for the update on Charles’ and your branches. I am using
nokogiri in 2 examples in a new Ruby book that I am writing for
APress; I had a warning about JRuby incompatibility (and have a little
code using a pure Ruby alternative), but by the time the book is
published (5 months) it looks like we will have a working version for
JRuby. I could be of more help with the pure Java version, so I will
as Charles if he wants help.