Nokogiri 1.4.1 Released

Hey everyone! Have you finished your holiday shopping yet? I know I
haven’t.
Fortunately for you guys, Mike and I like programming a lot more than
shopping. I mean, don’t get me wrong. I love shopping for myself, I
just find shopping for other people to be, well, difficult.

Anyway, let’s get down to business:

nokogiri version 1.4.1 has been released!

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s
many features is the ability to search documents via XPath or CSS3
selectors.

XML is like violence - if it doesn’t solve your problems, you are not
using
enough of it.

Changes:

1.4.1 / 2009/12/10

  • New Features

    • Added Nokogiri::LIBXML_ICONV_ENABLED
    • Alias Node#[] to Node#attr
    • XML::Node#next_element added
    • XML::Node#> added for searching a nodes immediate children
    • XML::NodeSet#reverse added
    • Added fragment support to Node#add_child, Node#add_next_sibling,
      Node#add_previous_sibling, and Node#replace.
    • XML::Node#previous_element implemented
    • Rubinius support
    • Ths CSS selector engine now supports :has()
    • XML::NodeSet#filter() was added
    • XML::Node.next= and .previous= are aliases for add_next_sibling and
      add_previous_sibling. GH #183
  • Bugfixes

    • XML fragments with namespaces do not raise an exception
      (regression in 1.4.0)
    • Node#matches? works in nodes contained by a DocumentFragment. GH
      #158
    • Document should not define add_namespace() method. GH #169
    • XPath queries returning namespace declarations do not segfault.
    • Node#replace works with nodes from different documents. GH #162
    • Adding XML::Document#collect_namespaces
    • Fixed bugs in the SOAP4R adapter
    • Fixed bug in XML::Node#next_element for certain edge cases
    • Fixed load path issue with JRuby under Windows. GH #160.
    • XSLT#apply_to will honor the “output method”. Thanks richardlehane!
    • Fragments containing leading text nodes with newlines now parse
      properly.
      GH #178.

FEATURES:

  • XPath support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

SUPPORT:

The Nokogiri {mailing
list}[http://groups.google.com/group/nokogiri-talk]
is available here:

The {bug tracker}[http://github.com/tenderlove/nokogiri/issues]
is available here:

The IRC channel is #nokogiri on freenode.

SYNOPSIS:

require ‘nokogiri’
require ‘open-uri’

Get a Nokogiri::HTML:Document for the page we’re interested in…

doc =
Nokogiri::HTML(open(‘tenderlove - Google Search’))

Do funky things with it using Nokogiri::XML::Node methods…

Search for nodes by css

doc.css(‘h3.r a.l’).each do |link|
puts link.content
end

Search for nodes by xpath

doc.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Or mix and match.

doc.search(‘h3.r a.l’, ‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

REQUIREMENTS:

  • ruby 1.8 or 1.9
  • libxml2
  • libxml2-dev
  • libxslt
  • libxslt-dev

INSTALL:

  • sudo gem install nokogiri

Good to hear mate – and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I’ve ever used.

Just letting you know your work is thoroughly appreciated.

On 2009-12-11, Bapabooiee [email protected] wrote:

Good to hear mate – and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I’ve ever used.

Just letting you know your work is thoroughly appreciated.

+1

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

Regards,

Jeremy H.

On Dec 11, 2009, at 13:25 , Jeremy H. wrote:

you for an excellent piece of work.
next time I’d suggest using mechanize.

On Sat, Dec 12, 2009 at 06:25:05AM +0900, Jeremy H. wrote:

you for an excellent piece of work.
Thanks guys! It’s good to hear nice things once in a while. :smiley:

Thanks for using nokogiri, and if you run in to bugs, make sure to
report them!

On 2009-12-11, Ryan D. [email protected] wrote:

On Dec 11, 2009, at 13:25 , Jeremy H. wrote:

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

next time I’d suggest using mechanize.

I’ll keep it in mind, thanks. sigh, so many toys, so little time!
:slight_smile:

Jeremy H.

On Dec 12, 4:26 am, Jeremy H. [email protected] wrote:

next time I’d suggest using mechanize.

I’ll keep it in mind, thanks. sigh, so many toys, so little time! :slight_smile:

Jeremy H.

You would actually probably want to use a combination of both,
depending on what you’re doing. You could use Mechanize for crawling &
scraping the site, and then you use Nokogiri to pry the information
you want out of the markup.

On Dec 14, 2009, at 14:00 , Bapabooiee wrote:

next time I’d suggest using mechanize.

I’ll keep it in mind, thanks. sigh, so many toys, so little time! :slight_smile:

Jeremy H.

You would actually probably want to use a combination of both,
depending on what you’re doing. You could use Mechanize for crawling &
scraping the site, and then you use Nokogiri to pry the information
you want out of the markup.

mechanize already uses nokogiri.