Error message invalid byte sequence in UTF-8

Using Ruby 2.2.1

When I run this Craigslist scraper script (see excerpt below) with ARGV
input strings such as “Casio” it works perfectly.

When I run this script with any ARGV that has duplicated letters -
for example “Hammond” this script fails with error listed below.

(eval):14:in ===': invalid byte sequence in UTF-8 (ArgumentError) from (eval):14:inblock (2 levels) in links_with’
from (eval):13:in each' from (eval):13:inall?’
from (eval):13:in block in links_with' from (eval):12:ineach’
from (eval):12:in find_all' from (eval):12:inlinks_with’
from Craig_Search.rb:47:in block (2 levels) in <main>' from Craig_Search.rb:40:ineach’
from Craig_Search.rb:40:in block in <main>' from Craig_Search.rb:29:ineach’
from Craig_Search.rb:29:in `’

Can anyone tell me what’s wrong or how to avoid?

The code has been working for some time without error but it seems some
some recent gem update “broke” it. Wish I could say which one.

Excerpt from script Craig_Search.rb

require ‘rubygems’
require ‘mechanize’
require ‘date’
require ‘net/ftp’

…some more code…

craigslist cities to search along with search terms

cities = [“daytona”, “ocala”, “orlando”, “cfl”]
search_words = ARGV.dup
agent = Mechanize.new
link_file = “/home/raymon/Documents/CraigsList_all_links.htm”
remote_file2 = “CraigsList_all_links.htm”

line 29----->>> search_words.each { |word|

replace spaces in search word with underscore for the links

mod_word =word.gsub(/[ ]/, “_”)

#file to be created
web_file =
“/home/raymon/Documents/CraigsList_”+mod_word+"_Links.htm"

open the local file for writing and begin creating webpage for later

upload to server

we do want to update the complete page each time – no appending

my_file = File.open(web_file, ‘w’)

line 40 ------>>>>> cities.each { |city|

write the headings to the file

my_file.puts("<h3 class=\"blue\">"+city.capitalize+"</h3>")

url="http://"+city+".craigslist.org/search/?areaID=238&subAreaID=&query="+word+"&catAbb=sss"
page = agent.get(url)

line 47 ------>>>>>> found_link = page.links_with(:text =>
/#{word}/i)

found_link.each { |link|...............rest of code

strangely it works without error if I search on “Kimball” (two 'L’s)
but still on “Hammond” (two 'M’s) …What gives?

Any one with ideas?

Seems like a problem of Mechanize.

Meaby you want to check out:

https://github.com/sparklemotion/mechanize/issues/393

https://github.com/sparklemotion/mechanize/issues/333

Best regards.

Damián M. González wrote in post #1177980:

Seems like a problem of Mechanize.

Meaby you want to check out:

https://github.com/sparklemotion/mechanize/issues/393

https://github.com/sparklemotion/mechanize/issues/333

Best regards.

Thanks …but that is no help…; problem still remains…

So if want to solution this problem, start seeing the code of Mechanize,
understand it and try to figure out what is wrong. Simple as that.

Damián M. González wrote in post #1178271:

So if want to solution this problem, start seeing the code of Mechanize,
understand it and try to figure out what is wrong. Simple as that.

If this is a mechanize problem shouldn’t they fix it?

Of course, but remember that is open-source.

Damián M. González wrote in post #1178271:

So if want to solution this problem, start seeing the code of Mechanize,
understand it and try to figure out what is wrong. Simple as that.

Sorry …that is beyond my capabilities.
Sure wish someone had something specific to suggest.
oh well…

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs