Issue #7156 has been reported by t0d0r (Todor Dragnev). ---------------------------------------- Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib https://bugs.ruby-lang.org/issues/7156 Author: t0d0r (Todor Dragnev) Status: Open Priority: Normal Assignee: Category: lib Target version: ruby -v: 1.9.3 Invalid byte sequence in US-ASCII on ruby 1.9.3 I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library... adding str.force_encoding(Encoding::BINARY) to following method fix the problem class URI::Parser def escape(str, unsafe = @regexp[:UNSAFE]) unless unsafe.kind_of?(Regexp) # perhaps unsafe is String object unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false) end str.force_encoding(Encoding::BINARY) # FIX str.gsub(unsafe) do us = $& tmp = '' us.each_byte do |uc| tmp << sprintf('%%%02X', uc) end tmp end.force_encoding(Encoding::US_ASCII) end end One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?
on 2012-10-14 01:53
on 2012-10-15 21:15
Issue #7156 has been updated by meta (mathew murphy). What part of the URL contains the UTF-8 characters? If it's the domain, you need to decode the UTF-8 into punycode before passing it to Ruby. It it's in the path, Ruby ought to handle it for IRI compliance, but probably doesn't right now... http://www.w3.org/International/articles/idn-and-iri/ ---------------------------------------- Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib https://bugs.ruby-lang.org/issues/7156#change-30788 Author: t0d0r (Todor Dragnev) Status: Open Priority: Normal Assignee: Category: lib Target version: ruby -v: 1.9.3 Invalid byte sequence in US-ASCII on ruby 1.9.3 I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library... adding str.force_encoding(Encoding::BINARY) to following method fix the problem class URI::Parser def escape(str, unsafe = @regexp[:UNSAFE]) unless unsafe.kind_of?(Regexp) # perhaps unsafe is String object unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false) end str.force_encoding(Encoding::BINARY) # FIX str.gsub(unsafe) do us = $& tmp = '' us.each_byte do |uc| tmp << sprintf('%%%02X', uc) end tmp end.force_encoding(Encoding::US_ASCII) end end One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?
on 2012-11-06 12:46
Issue #7156 has been updated by mame (Yusuke Endoh).
File bulgarian.rb added
Status changed from Open to Feedback
Target version set to 2.0.0
I'm not sure what you want. I cannot reproduce this issue by the
following code.
$ cat bulgarian.rb
# coding: UTF-8
require "uri"
p URI.escape("История")
$ ruby bulgarian.rb
"%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F"
Could you please tell us a example code, expected result and actual one?
--
Yusuke Endoh <mame@tsg.ne.jp>
----------------------------------------
Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib
https://bugs.ruby-lang.org/issues/7156#change-32489
Author: t0d0r (Todor Dragnev)
Status: Feedback
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0
ruby -v: 1.9.3
Invalid byte sequence in US-ASCII on ruby 1.9.3
I receive that error when trying to open url with bulgarian text (utf-8:
"История"). It seems that the problem is in uri/common.rb from ruby
standard library...
adding str.force_encoding(Encoding::BINARY) to following method fix the
problem
class URI::Parser
def escape(str, unsafe = @regexp[:UNSAFE])
unless unsafe.kind_of?(Regexp)
# perhaps unsafe is String object
unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false)
end
str.force_encoding(Encoding::BINARY) # FIX
str.gsub(unsafe) do
us = $&
tmp = ''
us.each_byte do |uc|
tmp << sprintf('%%%02X', uc)
end
tmp
end.force_encoding(Encoding::US_ASCII)
end
end
One more suggestion - maybe US_ASCII must be replaced to
Encoding::BINARY too?
on 2013-02-17 06:02
Issue #7156 has been updated by ko1 (Koichi Sasada). Target version changed from 2.0.0 to next minor No feedback. ---------------------------------------- Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib https://bugs.ruby-lang.org/issues/7156#change-36365 Author: t0d0r (Todor Dragnev) Status: Feedback Priority: Normal Assignee: Category: lib Target version: next minor ruby -v: 1.9.3 Invalid byte sequence in US-ASCII on ruby 1.9.3 I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library... adding str.force_encoding(Encoding::BINARY) to following method fix the problem class URI::Parser def escape(str, unsafe = @regexp[:UNSAFE]) unless unsafe.kind_of?(Regexp) # perhaps unsafe is String object unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false) end str.force_encoding(Encoding::BINARY) # FIX str.gsub(unsafe) do us = $& tmp = '' us.each_byte do |uc| tmp << sprintf('%%%02X', uc) end tmp end.force_encoding(Encoding::US_ASCII) end end One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?
on 2013-02-18 01:14
Issue #7156 has been updated by ko1 (Koichi Sasada). Assignee set to naruse (Yui NARUSE) ---------------------------------------- Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib https://bugs.ruby-lang.org/issues/7156#change-36469 Author: t0d0r (Todor Dragnev) Status: Feedback Priority: Normal Assignee: naruse (Yui NARUSE) Category: lib Target version: next minor ruby -v: 1.9.3 Invalid byte sequence in US-ASCII on ruby 1.9.3 I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library... adding str.force_encoding(Encoding::BINARY) to following method fix the problem class URI::Parser def escape(str, unsafe = @regexp[:UNSAFE]) unless unsafe.kind_of?(Regexp) # perhaps unsafe is String object unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false) end str.force_encoding(Encoding::BINARY) # FIX str.gsub(unsafe) do us = $& tmp = '' us.each_byte do |uc| tmp << sprintf('%%%02X', uc) end tmp end.force_encoding(Encoding::US_ASCII) end end One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.