Mechanize and charset issues


#1

I’m not sure what is causing this error as I can successfully login, I
just can’t submit this form without the script bombing out.

formatstring = “testing submission”

agent = WWW::Mechanize.new
page = agent.get ‘hidden’
form = page.forms.first
if !(form.action.eql?(‘submit.php’))
p “logging in…”
form[‘username’] = ‘hidden’
form[‘password’] = ‘hidden’

    page = agent.submit form
    page = agent.click(page.link_with(:text => 'Add'))

end

page = agent.click(page.link_with(:text => ‘[Add Content]’))
uploadForm = page.forms[6]
uploadForm[‘format’] = formatstring
page = agent.submit uploadForm
#pp page

Gives me the error:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence) from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:infrom_native_charset’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:151:in
from_native_charset' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:143:inproc_query’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
map' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:inproc_query’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:165:in
build_query' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:ineach’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
build_query' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:213:inrequest_data’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:392:in
post_form' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:insubmit’


#2

It is caused by the following html â€&oelig in one of the
hidden form entries that is being submitted. I’m not sure how to avoid
this from bombing and still submit the form though?


#3

John Schmitz wrote:

page = agent.click(page.link_with(:text => ‘[Add Content]’))
uploadForm = page.forms[6]
uploadForm[‘format’] = formatstring
page = agent.submit uploadForm
#pp page

Gives me the error:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`iconv’: “\342\202\254\305\223a condition”… (Iconv::IllegalSequence)
from

Running “ruby -KU …” will probably fix it (at least it has worked for
me whenever I had errors from \nnn inside strings).


#4

If anyone comes across this problem, this is how I fixed it. Found a
method online and made some minor changes and additions. I just pass the
problem strings through this and it gives me back strings that don’t
have issues.

def fix_quotes©
c.gsub!(/\342\200(?:\234|\235)/,’"’)
c.gsub!(/\342\200(?:\230|\231)/,"’")
c.gsub!(/\342\200\223/,"-")
c.gsub!(/\342\200\246/,"…")
c.gsub!(/\303\242\342\202\254\342\204\242/,"’")
c.gsub!(/\303\242\342\202\254\302\235/,’"’)
c.gsub!(/\303\242\342\202\254\305\223/,’"’)
c.gsub!(/\303\242\342\202\254"/,’-’)
c.gsub!(/\342\202\254\313\234/,’"’)
end


#5

The Higgs bozo wrote:

John Schmitz wrote:

page = agent.click(page.link_with(:text => ‘[Add Content]’))
uploadForm = page.forms[6]
uploadForm[‘format’] = formatstring
page = agent.submit uploadForm
#pp page

Gives me the error:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`iconv’: “\342\202\254\305\223a condition”… (Iconv::IllegalSequence)
from

Running “ruby -KU …” will probably fix it (at least it has worked for
me whenever I had errors from \nnn inside strings).

Thank you for the response but it doesn’t seem to solve the issue. I
think it’s related to charsets and iconv, but I have no idea where to go
from there. I get a near duplicate error message with ruby -KU:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
iconv': "â¬Åa condition"... (Iconv::IllegalSequence) from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:infrom_native_charset’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:151:in
from_native_charset' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:143:inproc_query’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
map' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:inproc_query’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:165:in
build_query' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:ineach’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
build_query' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:213:inrequest_data’
from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:392:in
post_form' from /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:insubmit’


#6

Have you tried to set encoding for page something like this:
page.encoding = ‘UTF-8’?

Jarmo

John Schmitz wrote:

If anyone comes across this problem, this is how I fixed it. Found a
method online and made some minor changes and additions. I just pass the
problem strings through this and it gives me back strings that don’t
have issues.

def fix_quotes©
c.gsub!(/\342\200(?:\234|\235)/,’"’)
c.gsub!(/\342\200(?:\230|\231)/,"’")
c.gsub!(/\342\200\223/,"-")
c.gsub!(/\342\200\246/,"…")
c.gsub!(/\303\242\342\202\254\342\204\242/,"’")
c.gsub!(/\303\242\342\202\254\302\235/,’"’)
c.gsub!(/\303\242\342\202\254\305\223/,’"’)
c.gsub!(/\303\242\342\202\254"/,’-’)
c.gsub!(/\342\202\254\313\234/,’"’)
end