I am using EventMachine and em-http gems to open some urls
it works fine but when I try to open a site with utf-8 encoding (french
or arabic) I get unicode symbols instead of real character (\u202B…)
for some sites, I get an empty answer, (I verified it has a non empty
content)
Is there a way to deal with this problem
regards
I am using EventMachine and em-http gems to open some urls
it works fine but when I try to open a site with utf-8 encoding (french
or arabic) I get unicode symbols instead of real character (\u202B…)
for some sites, I get an empty answer, (I verified it has a non empty
content)
Is there a way to deal with this problem
regards
I would like to see what you wrote, but I can only suggest for Ruby
1.9.3
Thanks for your answer,
the problem is not in the url encoding, it is in the encoding of the
response body, when there is special character (é, è…) or arabic
character, I get unicode character and sometime I get nothing
this is my code:
EM.synchrony do
concurrency = 2
urls = [‘http://url1’, ‘http://url2’, ‘http://url3’]
results = []
EM::Synchrony::FiberIterator.new(urls, concurrency).each do |url|
resp = EventMachine::HttpRequest.new(url).get
results.push resp.response
end
results.each{|result|
doc = Nokogiri::HTML(result, nil, 'utf-8')
title = doc.xpath(".//title").text
p title
}
EventMachine.stop
end
best regards
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.