Mechanize 2.0

mechanize version 2.0 has been released!

The Mechanize library is used for automating interaction with websites.
Mechanize automatically stores and sends cookies, follows redirects,
can follow links, and submit forms. Form fields can be populated and
submitted. Mechanize also keeps track of the sites that you have
visited as
a history.

Changes:

2.0 / 2011-06-27

Mechanize is now under the MIT license

  • API changes

    • WWW::Mechanize has been removed. Use Mechanize.
    • Pre connect hooks are now called with the agent and the request.
      See
      Mechanize#pre_connect_hooks.
    • Post connect hooks are now called with the agent and the response.
      See
      Mechanize#post_connect_hooks.
    • Mechanize::Chain is gone, as an internal API this should cause no
      problems.
    • Mechanize#fetch_page no longer accepts an options Hash.
    • Mechanize#put now accepts headers instead of an options Hash as the
      last
      argument
    • Mechanize#delete now accepts headers instead of an options Hash as
      the
      last argument
    • Mechanize#request_with_entity now accepts headers instead of an
      options
      Hash as the last argument
    • Mechanize no longer raises RuntimeError directly, Mechanize::Error
      or
      ArgumentError are raised instead.
    • The User-Agent header has changed. It no longer includes the WWW-
      prefix
      and now includes the ruby version. The URL has been updated as
      well.
    • Mechanize now requires ruby 1.8.7 or newer.
    • Hpricot support has been removed as webrobots requires nokogiri.
    • Mechanize#get no longer accepts the referer as the second argument.
    • Mechanize#get no longer allows the HTTP method to be changed (:verb
      option).
    • Mechanize::Page::Meta is now Mechanize::Page::MetaRefresh to
      accurately
      depict its responsibilities.
    • Mechanize::Page#meta is now Mechanize::Page#meta_refresh as it only
      contains meta elements with http-equiv of “refresh”
    • Mechanize::Page#charset is now Mechanize::Page::charset. GH #112,
      patch
      by Godfrey Chan.
  • Deprecations

    • Mechanize#get with an options hash is deprecated and will be removed
      after
      October, 2011.
    • Mechanize::Util::to_native_charset is deprecated as it is no longer
      used
      by Mechanize.
  • New Features

    • Add header reference methods to Mechanize::File so that a reponse
      object gets compatible with Net::HTTPResponse.
    • Mechanize#click accepts a regexp or string to click a button/link in
      the
      current page. It works as expected when not passed a string or
      regexp.
    • Provide a way to only follow permanent redirects (301)
      automatically: agent.redirect_ok = :permanent GH #73
    • Mechanize now supports HTML5 meta charset. GH #113
    • Documented various Mechanize accessors. GH #66
    • Mechanize now uses net-http-digest_auth. GH #31
    • Mechanize now implements session cookies. GH #78
    • Mechanize now implements deflate decoding. GH #40
    • Mechanize now allows a certificate and key to be passed directly.
      GH #71
    • Mechanize::Form::MultiSelectList now implements #option_with and
      #options_with. GH #42
    • Add Mechanize::Page::Link#rel and #rel?(kind) to read and test the
      rel
      attribute.
    • Add Mechanize::Page#canonical_uri to read a tag.
    • Add support for Robots Exclusion Protocol (i.e. robots.txt) and
      nofollow/noindex in meta tags and the rel attribute. Automatic
      exclusion can be turned on by setting:
      agent.robots = true
    • Manual robots.txt test can be performed with
      Mechanize#robots_allowed? and #robots_disallowed?.
    • Mechanize::Form now supports the accept-charset attribute. GH #96
    • Mechanize::ResponseReadError is raised if there is an exception
      while
      reading the response body. This allows recovery from broken HTTP
      servers
      (or connections). GH #90
    • Mechanize#follow_meta_refresh set to :anywhere will follow meta
      refresh
      found outside of a document’s head. GH #99
    • Add support for HTML5’s rel=“noreferrer” attribute which indicates
      no “Referer” information should be sent when following the link.
    • A frame will now load its content when #content is called. GH #111
    • Added Mechanize#default_encoding to provide a default for pages with
      no
      encoding specified. GH #104
    • Added Mechanize#force_default_encoding which only uses
      Mechanize#default_encoding for parsing HTML. GH #104
  • Bug Fixes:

    • Fixed a bug where Referer is not sent when accessing a relative
      URI starting with “http”.
    • Fix handling of Meta Refresh with relative paths. GH #39
    • Mechanize::CookieJar now supports RFC 2109 correctly. GH #85
    • Fixed typo in EXAMPLES.rdoc. GH #74
    • The base element is now handled correctly for images. GH #72
    • Image buttons with no name attribute are now included in the form’s
      button
      list. GH#56
    • Improved handling of non ASCII-7bit compatible characters in links
      (only
      an issue on ruby 1.8). GH #36, GH #75
    • Loading cookies.txt is faster. GH #38
    • Mechanize no longer sends cookies for a.b.example to axb.example.
      GH #41
    • Mechanize no longer sends the button name as a form field for image
      buttons. GH #45
    • Blank cookie values are now skipped. GH #80
    • Mechanize now adds a ‘.’ to cookie domains if no ‘.’ was sent. This
      is
      not allowed by RFC 2109 but does appear in RFC 2965. GH #86
    • file URIs are now read in binary mode. GH #83
    • Content-Encoding: x-gzip is now treated like gzip per RFC 2616.
    • Mechanize now unescapes URIs for meta refresh. GH #68
    • Mechanize now has more robust HTML charset detection. GH #43
    • Mechanize::Form::Textarea is now created from a textarea element.
      GH #94
    • A meta content-type now overrides the HTTP content type. GH #114
    • Mechanize::Page::Link#uri now handles both escaped and unescaped
      hrefs.
      GH #107