Forum: Ruby Mechanize and charset issues

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
John S. (Guest)
on 2009-05-27 02:19
I'm not sure what is causing this error as I can successfully login, I
just can't submit this form without the script bombing out.

formatstring = "testing submission"

agent = WWW::Mechanize.new
page = agent.get 'hidden'
form = page.forms.first
if !(form.action.eql?('submit.php'))
        p "logging in....."
        form['username'] = 'hidden'
        form['password'] = 'hidden'

        page = agent.submit form
        page = agent.click(page.link_with(:text => 'Add'))
end

page = agent.click(page.link_with(:text => '[Add Content]'))
uploadForm = page.forms[6]
uploadForm['format'] = formatstring
page = agent.submit uploadForm
#pp page

Gives me the error:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence)
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:151:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:143:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`map'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:165:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`each'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:213:in
`request_data'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:392:in
`post_form'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:in
`submit'
John S. (Guest)
on 2009-05-27 02:34
It is caused by the following html â€&oelig in one of the
hidden form entries that is being submitted. I'm not sure how to avoid
this from bombing and still submit the form though?
The H. (Guest)
on 2009-05-27 02:49
John Schmitz wrote:
>
> page = agent.click(page.link_with(:text => '[Add Content]'))
> uploadForm = page.forms[6]
> uploadForm['format'] = formatstring
> page = agent.submit uploadForm
> #pp page
>
> Gives me the error:
>
> /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
> `iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence)
>         from

Running "ruby -KU ..." will probably fix it (at least it has worked for
me whenever I had errors from \nnn inside strings).
John S. (Guest)
on 2009-05-27 02:53
The Higgs bozo wrote:
> John Schmitz wrote:
>>
>> page = agent.click(page.link_with(:text => '[Add Content]'))
>> uploadForm = page.forms[6]
>> uploadForm['format'] = formatstring
>> page = agent.submit uploadForm
>> #pp page
>>
>> Gives me the error:
>>
>> /var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
>> `iconv': "\342\202\254\305\223a condition"... (Iconv::IllegalSequence)
>>         from
>
> Running "ruby -KU ..." will probably fix it (at least it has worked for
> me whenever I had errors from \nnn inside strings).

Thank you for the response but it doesn't seem to solve the issue. I
think it's related to charsets and iconv, but I have no idea where to go
from there. I get a near duplicate error message with ruby -KU:

/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`iconv': "â¬Åa condition"... (Iconv::IllegalSequence)
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/util.rb:40:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:151:in
`from_native_charset'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:143:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`map'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:142:in
`proc_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:165:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`each'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:164:in
`build_query'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/form.rb:213:in
`request_data'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:392:in
`post_form'
        from
/var/lib/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:in
`submit'
John S. (Guest)
on 2009-05-28 02:19
If anyone comes across this problem, this is how I fixed it. Found a
method online and made some minor changes and additions. I just pass the
problem strings through this and it gives me back strings that don't
have issues.

def fix_quotes(c)
      c.gsub!(/\342\200(?:\234|\235)/,'"')
      c.gsub!(/\342\200(?:\230|\231)/,"'")
      c.gsub!(/\342\200\223/,"-")
      c.gsub!(/\342\200\246/,"...")
      c.gsub!(/\303\242\342\202\254\342\204\242/,"'")
      c.gsub!(/\303\242\342\202\254\302\235/,'"')
      c.gsub!(/\303\242\342\202\254\305\223/,'"')
      c.gsub!(/\303\242\342\202\254"/,'-')
      c.gsub!(/\342\202\254\313\234/,'"')
end
Jarmo P. (Guest)
on 2009-05-28 12:41
Have you tried to set encoding for page something like this:
page.encoding = 'UTF-8'?

Jarmo

John Schmitz wrote:
> If anyone comes across this problem, this is how I fixed it. Found a
> method online and made some minor changes and additions. I just pass the
> problem strings through this and it gives me back strings that don't
> have issues.
>
> def fix_quotes(c)
>       c.gsub!(/\342\200(?:\234|\235)/,'"')
>       c.gsub!(/\342\200(?:\230|\231)/,"'")
>       c.gsub!(/\342\200\223/,"-")
>       c.gsub!(/\342\200\246/,"...")
>       c.gsub!(/\303\242\342\202\254\342\204\242/,"'")
>       c.gsub!(/\303\242\342\202\254\302\235/,'"')
>       c.gsub!(/\303\242\342\202\254\305\223/,'"')
>       c.gsub!(/\303\242\342\202\254"/,'-')
>       c.gsub!(/\342\202\254\313\234/,'"')
> end
This topic is locked and can not be replied to.