Em-http request

Hi,

I am using EventMachine and em-http gems to open some urls
it works fine but when I try to open a site with utf-8 encoding (french
or arabic) I get unicode symbols instead of real character (\u202B…)
for some sites, I get an empty answer, (I verified it has a non empty
content)
Is there a way to deal with this problem
regards

On Sat Jul 28 17:31:43 2012, rubix Rubix wrote:

Hi,

I am using EventMachine and em-http gems to open some urls
it works fine but when I try to open a site with utf-8 encoding (french
or arabic) I get unicode symbols instead of real character (\u202B…)
for some sites, I get an empty answer, (I verified it has a non empty
content)
Is there a way to deal with this problem
regards

I would like to see what you wrote, but I can only suggest for Ruby
1.9.3

require ‘addressable/uri’
normalized_url =
Addressable::URI.parse(your_url_with_utf8_characters).normalize

Thanks for your answer,
the problem is not in the url encoding, it is in the encoding of the
response body, when there is special character (é, è…) or arabic
character, I get unicode character and sometime I get nothing

this is my code:

EM.synchrony do
concurrency = 2
urls = [‘http://url1’, ‘http://url2’, ‘http://url3’]
results = []
EM::Synchrony::FiberIterator.new(urls, concurrency).each do |url|
resp = EventMachine::HttpRequest.new(url).get
results.push resp.response
end

results.each{|result|
  doc = Nokogiri::HTML(result, nil, 'utf-8')
  title = doc.xpath(".//title").text
  p title
}

EventMachine.stop

end

best regards

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs