Hello list,
I need to parse the contents of a weird web site, that uses a
session id which is 80,000 characters long, on a hidden input tag.
I try to use Mechanize for the task, but, since this web page has
the 12th line with 80k characters, I get the following error:
/usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in
scan': ran out of buffer space on element <input>, starting on line 12. (Hpricot::ParseError) from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in
make’
from
/usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:15:in
parse' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize/page.rb:37:in
initialize’
from
/usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in
new' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in
fetch_page’
from /usr/lib/ruby/1.8/net/http.rb:1050:in request' from /usr/lib/ruby/1.8/net/http.rb:2133:in
reading_body’
from /usr/lib/ruby/1.8/net/http.rb:1049:in request' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:514:in
fetch_page’
from
/usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:185:in
`get’
Probably, the line buffer in Hpricot is a fixed size buffer and can’t
take this big line.
The “program” is this simple test script:
require ‘rubygems’
require ‘mechanize’
agent = WWW::Mechanize.new
agent.user_agent_alias = ‘Mac Safari’
page =
agent.get(“https://replica.megsa.com.ar/Usuario/MantenimientoContratos.aspx”)
puts page.body
Is there a way to configure Hpricot to use a dynamically sized
collection for the line buffer?