Nokogiri 1.4.1 Released

aaronpowell · December 11, 2009, 6:32am

Hey everyone! Have you finished your holiday shopping yet? I know I
haven’t.
Fortunately for you guys, Mike and I like programming a lot more than
shopping. I mean, don’t get me wrong. I love shopping for myself, I
just find shopping for other people to be, well, difficult.

Anyway, let’s get down to business:

nokogiri version 1.4.1 has been released!

Nokogiri (é‹¸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s
many features is the ability to search documents via XPath or CSS3
selectors.

XML is like violence - if it doesn’t solve your problems, you are not
using
enough of it.

Changes:

1.4.1 / 2009/12/10

New Features
- Added Nokogiri::LIBXML_ICONV_ENABLED
- Alias Node#[] to Node#attr
- XML::Node#next_element added
- XML::Node#> added for searching a nodes immediate children
- XML::NodeSet#reverse added
- Added fragment support to Node#add_child, Node#add_next_sibling,
  Node#add_previous_sibling, and Node#replace.
- XML::Node#previous_element implemented
- Rubinius support
- Ths CSS selector engine now supports :has()
- XML::NodeSet#filter() was added
- XML::Node.next= and .previous= are aliases for add_next_sibling and
  add_previous_sibling. GH #183
Bugfixes
- XML fragments with namespaces do not raise an exception
  (regression in 1.4.0)
- Node#matches? works in nodes contained by a DocumentFragment. GH
  #158
- Document should not define add_namespace() method. GH #169
- XPath queries returning namespace declarations do not segfault.
- Node#replace works with nodes from different documents. GH #162
- Adding XML::Document#collect_namespaces
- Fixed bugs in the SOAP4R adapter
- Fixed bug in XML::Node#next_element for certain edge cases
- Fixed load path issue with JRuby under Windows. GH #160.
- XSLT#apply_to will honor the “output method”. Thanks richardlehane!
- Fragments containing leading text nodes with newlines now parse
  properly.
  GH #178.

FEATURES:

XPath support for document searching
CSS3 selector support for document searching
XML/HTML builder

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

Result · GitHub

SUPPORT:

The Nokogiri {mailing
list}[http://groups.google.com/group/nokogiri-talk]
is available here:

http://groups.google.com/group/nokogiri-talk

The {bug tracker}[http://github.com/tenderlove/nokogiri/issues]
is available here:

http://github.com/tenderlove/nokogiri/issues

The IRC channel is #nokogiri on freenode.

SYNOPSIS:

require ‘nokogiri’
require ‘open-uri’

Get a Nokogiri::HTML:Document for the page weâ€™re interested in…

doc =
Nokogiri::HTML(open(‘tenderlove - Google Search’))

Do funky things with it using Nokogiri::XML::Node methods…

Search for nodes by css

doc.css(‘h3.r a.l’).each do |link|
puts link.content
end

Search for nodes by xpath

doc.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Or mix and match.

doc.search(‘h3.r a.l’, ‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

REQUIREMENTS:

ruby 1.8 or 1.9
libxml2
libxml2-dev
libxslt
libxslt-dev

INSTALL:

sudo gem install nokogiri

aaronpowell · December 11, 2009, 8:55pm

Good to hear mate – and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I’ve ever used.

Just letting you know your work is thoroughly appreciated.

aaronpowell · December 11, 2009, 10:25pm

On 2009-12-11, Bapabooiee [email protected] wrote:

Good to hear mate – and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I’ve ever used.

Just letting you know your work is thoroughly appreciated.

+1

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

Regards,

Jeremy H.

aaronpowell · December 11, 2009, 11:12pm

On Dec 11, 2009, at 13:25 , Jeremy H. wrote:

you for an excellent piece of work.
next time I’d suggest using mechanize.

aaronpowell · December 12, 2009, 7:34pm

On Sat, Dec 12, 2009 at 06:25:05AM +0900, Jeremy H. wrote:

you for an excellent piece of work.
Thanks guys! It’s good to hear nice things once in a while.

Thanks for using nokogiri, and if you run in to bugs, make sure to
report them!

aaronpowell · December 12, 2009, 12:33pm

On 2009-12-11, Ryan D. [email protected] wrote:

On Dec 11, 2009, at 13:25 , Jeremy H. wrote:

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

next time I’d suggest using mechanize.

I’ll keep it in mind, thanks. sigh, so many toys, so little time!

Jeremy H.

aaronpowell · December 14, 2009, 11:00pm

On Dec 12, 4:26 am, Jeremy H. [email protected] wrote:

next time I’d suggest using mechanize.

I’ll keep it in mind, thanks. sigh, so many toys, so little time!

Jeremy H.

You would actually probably want to use a combination of both,
depending on what you’re doing. You could use Mechanize for crawling &
scraping the site, and then you use Nokogiri to pry the information
you want out of the markup.

aaronpowell · December 14, 2009, 11:37pm

On Dec 14, 2009, at 14:00 , Bapabooiee wrote:

next time I’d suggest using mechanize.

I’ll keep it in mind, thanks. sigh, so many toys, so little time!

Jeremy H.

You would actually probably want to use a combination of both,
depending on what you’re doing. You could use Mechanize for crawling &
scraping the site, and then you use Nokogiri to pry the information
you want out of the markup.

mechanize already uses nokogiri.