WWW::Mechanize 0.6.0 (Rufus)

Hi,

I would like to announce that my Mechpricot pie is done baking and is
ready to eat. The main feature of this release is that Mechanize uses
Hpricot as its internal HTML parser and that you can now treat a page
object returned from mechanize as an Hpricot object. This makes screen
scraping using mechanize much easier.

You can download it through gems:
gem install mechanize -y

or get it here:
http://rubyforge.org/projects/mechanize/

Check out the release notes and changelog for more cool stuff.

–Aaron

Aaron P. wrote:

Hi,

I would like to announce that my Mechpricot pie is done baking and is
ready to eat. The main feature of this release is that Mechanize uses
Hpricot as its internal HTML parser and that you can now treat a page
object returned from mechanize as an Hpricot object. This makes screen
scraping using mechanize much easier.

Currently, I use mechanize to grab nodes based on a watch list. These
are REXML Element nodes, and code that works with them expects the REXML
API.

Has this changed?


James B.

“I can see them saying something like ‘OMG Three Wizards Awesome’”

Aaron P. wrote:

scraping using mechanize much easier.
support some methods similar to REXML, so depending on how complicated your
logic is, you may be able to use Hpricot just fine. Otherwise, don’t
upgrade until 0.6.1.

Ah, thanks. My code takes these nodes and uses them to instantiate
assorted domain objects, using REXML’s XPath and element methods to
populate interval variables. That might be simple enough to replace
with Hpath, but I’ll wait to upgrade until I’m sure.

James

On Thu, Sep 07, 2006 at 06:30:10AM +0900, James B. wrote:

are REXML Element nodes, and code that works with them expects the REXML
API.

Has this changed?

Yes. You will get back Hpricot nodes in 0.6.0. I plan on having a
pluggable
parser in 0.6.1 that will return REXML nodes for you. Hpricot seems to
support some methods similar to REXML, so depending on how complicated
your
logic is, you may be able to use Hpricot just fine. Otherwise, don’t
upgrade until 0.6.1.

–Aaron

On 9/6/06, Aaron P. [email protected] wrote:

or get it here:
http://rubyforge.org/projects/mechanize/

Check out the release notes and changelog for more cool stuff.

–Aaron

I’m noticing some issues with the changed behavior of
WWW::Mechanize::Page#links.text

I used to just be able to grab a link using
page.links.text(/pattern/).first and it would work even if the had
children. It doesn’t seem to work anymore. I’m working on pinning
the issue down, but you likely have more insight. Is there a new way
to do this that’s more hpricot friendly?

Hpricot integration seems like a fine idea though, glad to see you
making use of it. Thanks for all the hard work.

On Fri, Sep 08, 2006 at 05:20:38AM +0900, Mat S. wrote:

I’m noticing some issues with the changed behavior of
WWW::Mechanize::Page#links.text

I used to just be able to grab a link using
page.links.text(/pattern/).first and it would work even if the had
children. It doesn’t seem to work anymore. I’m working on pinning
the issue down, but you likely have more insight. Is there a new way
to do this that’s more hpricot friendly?

This may be a bug in hpricot. That functionality should have remained
the same. The only difference is the parser being used. Could you
possibly
send sample code or sample html to one of the mechanize mailing lists:

http://rubyforge.org/mail/?group_id=1453

I don’t want to clutter ruby-talk with mechanize support stuff. :slight_smile:

Hpricot integration seems like a fine idea though, glad to see you
making use of it. Thanks for all the hard work.

No problem. Hopefully I can help you out!

–Aaron