Nokogiri 1.2.0 Released

aaronpowell · February 23, 2009, 5:23am

nokogiri version 1.2.0 has been released!

Nokogiri (é‹¸) is an HTML, XML, SAX, and Reader parser.

Changes:

1.2.0 / 2008-02-22

New features
- CSS search now supports CSS3 namespace queries
- Namespaces on the root node are automatically registered
- CSS queries use the default namespace
- Nokogiri::XML::Document#encoding get encoding used for this document
- Nokogiri::XML::Document#url get the document url
- Nokogiri::XML::Node#add_namespace add a namespace to the node LH#38
- Nokogiri::XML::Node#each iterate over attribute name, value pairs
- Nokogiri::XML::Node#keys get all attribute names
- Nokogiri::XML::Node#line get the line number for a node (Thanks
  Dirkjan Bussink!)
- Nokogiri::XML::Node#serialize now takes an optional encoding
  parameter
- Nokogiri::XML::Node#to_html, to_xml, and to_xhtml take an optional
  encoding
- Nokogiri::XML::Node#to_str
- Nokogiri::XML::Node#to_xhtml to produce XHTML documents
- Nokogiri::XML::Node#values get all attribute values
- Nokogiri::XML::Node#write_to writes the node to an IO object with
  optional encoding
- Nokogiri::XML::ProcessingInstrunction.new
- Nokogiri::XML::SAX::PushParser for all your push parsing needs.
Bugfixes
- Fixed Nokogiri::XML::Document#dup
- Fixed header detection. Thanks rubikitch!
- Fixed a problem where invalid CSS would cause the parser to hang
Deprecations
- Nokogiri::XML::Node.new_from_str will be deprecated in 1.3.0
API Changes
- Nokogiri::HTML.fragment now returns an XML::DocumentFragment (LH
  #32)

FEATURES:

XPath support for document searching
CSS3 selector support for document searching
XML/HTML builder
Drop in replacement for Hpricot (though not bug for bug)

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

Result · GitHub

Nokogiri also features an Hpricot compatibility layer to help ease the
change
to using correct CSS and XPath.

SUPPORT:

The Nokogiri mailing list is available here:

http://rubyforge.org/mailman/listinfo/nokogiri-talk

The bug tracker is available here:

Lighthouse - Beautifully Simple Issue Tracking

SYNOPSIS:

require ‘nokogiri’
require ‘open-uri’

doc =
Nokogiri::HTML(open(‘tenderlove - Google Search’))

Search for nodes by css

doc.css(‘h3.r a.l’).each do |link|
puts link.content
end

Search for nodes by xpath

doc.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Or mix and match.

doc.search(‘h3.r a.l’, ‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

REQUIREMENTS:

ruby 1.8 or 1.9
libxml
libxslt

INSTALL:

aaronpowell · February 23, 2009, 5:32am

Aaron P. wrote:

nokogiri version 1.2.0 has been released!

grabbed! Tx!

aaronpowell · February 23, 2009, 5:50am

Aaron P. wrote:

nokogiri version 1.2.0 has been released!

The following is a question, if I got it wrong, and a code snippet if I
didn’t.

To add nokogiri to a Merb project’s Cucumber featurizer, I…

Added it to features/support/env.rb:

require “merb-core”
require “spec”
require “merb_cucumber/world/simple”
require ‘nokogiri’ # <-- with the correct style of quote ‘ticks’!

Added its calls to features/steps/result_steps.rb

When /^you go to (.*)/ do |text|
@response = request(text)
@xdoc = Nokogiri::HTML(@response.body.to_s)
end

Then /^you should see an? (.+) element$/ do |searcher|
@xdoc.css(searcher).should_not be_nil
end

Called its steps from features/comics.feature:

Feature: serve webcomics

 Scenario: root page
   When you go to /
   Then you should see an img.comic element

The result is the usual bunch of pale green. (But note that I personally
suck at
writing customer-facing verbiage. They don’t care if they see an img
with a
class of comic! They want to see a comic image! More verbiage tuning is
in order…)

Also note that I suck at writing RSpec matchers. The point of all this
is clear
error messages at fault time, and my .should_not be_nil is also not
particularly
exemplary!

So thanks for the lib! it’s going to the top of my list from now on…

aaronpowell · February 23, 2009, 7:54am

On Feb 22, 2009, at 20:49 , Phlip wrote:

require ‘nokogiri’ # <-- with the correct style of quote ‘ticks’!

huh?

aaronpowell · February 23, 2009, 6:31am

Then /^you should see an? (.+) element$/ do |searcher|
@xdoc.css(searcher).should_not be_nil
end

Nooop. I forgot to test that in the negative - by changing the Then
commandment,
and seeing if it fails cleanly. It did not, possibly because a failing
CSS hit
does not return a nil.

After changing to should_not be_blank, I then upgraded the verbiage in
the
featurizer:

Then /^you should see an? (.+) (.+)$/ do |style, element|
element = { ‘image’ => ‘img’ }.fetch(element, element)
@xdoc.css("#{element}.#{style}").should_not be_blank
end

…

 When you go to /
 Then you should see a comic image

The {} is speculative coding; if I had a real client asking for these
features,
they might write ‘panel’, which I must then translate to

…