Forum: Ruby Mechanize and charset issues

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
15e1ed696720bd909cf263defd65bd42?d=identicon&s=25 John Schmitz (jojohoho)
on 2009-05-27 00:19
I'm not sure what is causing this error as I can successfully login, I
just can't submit this form without the script bombing out.

formatstring = "testing submission"

agent = WWW::Mechanize.new
page = agent.get 'hidden'
form = page.forms.first
if !(form.action.eql?('submit.php'))
        p "logging in....."
        form['username'] = 'hidden'
        form['password'] = 'hidden'

        page = agent.submit form
        page = agent.click(page.link_with(:text => 'Add'))
end

page = agent.click(page.link_with(:text => '[Add Content]'))
uploadForm = page.forms[6]
uploadForm['format'] = formatstring
page = agent.submit uploadForm
#pp page

Gives me the error:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence)
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:151:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:143:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`map'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:165:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`each'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:213:in
`request_data'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:392:in
`post_form'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:in
`submit'
15e1ed696720bd909cf263defd65bd42?d=identicon&s=25 John Schmitz (jojohoho)
on 2009-05-27 00:34
It is caused by the following html â€&oelig in one of the
hidden form entries that is being submitted. I'm not sure how to avoid
this from bombing and still submit the form though?
Beb77c4602c3cac7a12149431366ed11?d=identicon&s=25 The Higgs bozo (higgsbozo)
on 2009-05-27 00:49
John Schmitz wrote:
>
> page = agent.click(page.link_with(:text => '[Add Content]'))
> uploadForm = page.forms[6]
> uploadForm['format'] = formatstring
> page = agent.submit uploadForm
> #pp page
>
> Gives me the error:
>
> /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
> `iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence)
>         from

Running "ruby -KU ..." will probably fix it (at least it has worked for
me whenever I had errors from \nnn inside strings).
15e1ed696720bd909cf263defd65bd42?d=identicon&s=25 John Schmitz (jojohoho)
on 2009-05-27 00:53
The Higgs bozo wrote:
> John Schmitz wrote:
>>
>> page = agent.click(page.link_with(:text => '[Add Content]'))
>> uploadForm = page.forms[6]
>> uploadForm['format'] = formatstring
>> page = agent.submit uploadForm
>> #pp page
>>
>> Gives me the error:
>>
>> /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
>> `iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence)
>>         from
>
> Running "ruby -KU ..." will probably fix it (at least it has worked for
> me whenever I had errors from \nnn inside strings).

Thank you for the response but it doesn't seem to solve the issue. I
think it's related to charsets and iconv, but I have no idea where to go
from there. I get a near duplicate error message with ruby -KU:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`iconv': "â¬Åa condition"... (Iconv::IllegalSequence)
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:151:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:143:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`map'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:165:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`each'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:213:in
`request_data'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:392:in
`post_form'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:in
`submit'
15e1ed696720bd909cf263defd65bd42?d=identicon&s=25 John Schmitz (jojohoho)
on 2009-05-28 00:19
If anyone comes across this problem, this is how I fixed it. Found a
method online and made some minor changes and additions. I just pass the
problem strings through this and it gives me back strings that don't
have issues.

def fix_quotes(c)
      c.gsub!(/\342\200(?:\234|\235)/,'"')
      c.gsub!(/\342\200(?:\230|\231)/,"'")
      c.gsub!(/\342\200\223/,"-")
      c.gsub!(/\342\200\246/,"...")
      c.gsub!(/\303\242\342\202\254\342\204\242/,"'")
      c.gsub!(/\303\242\342\202\254\302\235/,'"')
      c.gsub!(/\303\242\342\202\254\305\223/,'"')
      c.gsub!(/\303\242\342\202\254"/,'-')
      c.gsub!(/\342\202\254\313\234/,'"')
end
Ff97ca87af59ee68ceff5877a8365788?d=identicon&s=25 Jarmo Pertman (juuser)
on 2009-05-28 10:41
Have you tried to set encoding for page something like this:
page.encoding = 'UTF-8'?

Jarmo

John Schmitz wrote:
> If anyone comes across this problem, this is how I fixed it. Found a
> method online and made some minor changes and additions. I just pass the
> problem strings through this and it gives me back strings that don't
> have issues.
>
> def fix_quotes(c)
>       c.gsub!(/\342\200(?:\234|\235)/,'"')
>       c.gsub!(/\342\200(?:\230|\231)/,"'")
>       c.gsub!(/\342\200\223/,"-")
>       c.gsub!(/\342\200\246/,"...")
>       c.gsub!(/\303\242\342\202\254\342\204\242/,"'")
>       c.gsub!(/\303\242\342\202\254\302\235/,'"')
>       c.gsub!(/\303\242\342\202\254\305\223/,'"')
>       c.gsub!(/\303\242\342\202\254"/,'-')
>       c.gsub!(/\342\202\254\313\234/,'"')
> end
This topic is locked and can not be replied to.