Forum: Ruby-core [ruby-trunk - Bug #6190][Open] String#encode return string containing invalid chars but marked as va

Posted by pplr (Pierre PLR) (Guest)
on 2012-03-22 13:32
(Received via mailing list)
Issue #6190 has been reported by pplr (Pierre PLR).

----------------------------------------
Bug #6190: String#encode return string containing invalid chars but 
marked as valid
https://bugs.ruby-lang.org/issues/6190

Author: pplr (Pierre PLR)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]



 >> a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace 
=> "?")
 >> a.valid_encoding?
 => true
 >> a
 => " \xE9 "
 >> a.squeeze
 ArgumentError: invalid byte sequence in UTF-8
   from (irb):32:in `squeeze'
   from (irb):32
   from /usr/bin/irb:12:in `<main>'

The expected string is " ? ", as the documentation for the ":replace" 
options says :
If the value is :replace, encode replaces invalid byte sequences in str 
with the replacement character.
Posted by "duerst (Martin Dürst)" <duerst@it.aoyama.ac.jp> (Guest)
on 2012-03-23 07:17
(Received via mailing list)
Issue #6190 has been updated by duerst (Martin Dürst).

Description updated

pplr (Pierre PLR) wrote:
> >> a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace => "?")
>  >> a.valid_encoding?
>  => true

Nobu fixed this so it won't return true anymore, which would be a lie.


>  >> a
>  => " \xE9 "

> The expected string is " ? ", as the documentation for the ":replace" options 
says :
> If the value is :replace, encode replaces invalid byte sequences in str with the 
replacement character.

I added documentation to say that encoding from encoding A to the same 
encoding A is a no-op. Changing this would not be impossible, but would 
involve quite some work, and would make these operations slower.
----------------------------------------
Bug #6190: String#encode return string containing invalid chars but 
marked as valid
https://bugs.ruby-lang.org/issues/6190#change-25066

Author: pplr (Pierre PLR)
Status: Closed
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]


 >> a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace 
=> "?")
 >> a.valid_encoding?
 => true
 >> a
 => " \xE9 "
 >> a.squeeze
 ArgumentError: invalid byte sequence in UTF-8
   from (irb):32:in `squeeze'
   from (irb):32
   from /usr/bin/irb:12:in `<main>'

The expected string is " ? ", as the documentation for the ":replace" 
options says :
If the value is :replace, encode replaces invalid byte sequences in str 
with the replacement character.
Posted by naruse (Yui NARUSE) (Guest)
on 2013-02-23 16:46
(Received via mailing list)
Issue #6190 has been updated by naruse (Yui NARUSE).

Status changed from Closed to Assigned


----------------------------------------
Backport #6190: String#encode return string containing invalid chars but 
marked as valid
https://bugs.ruby-lang.org/issues/6190#change-36833

Author: pplr (Pierre PLR)
Status: Assigned
Priority: Normal
Assignee:
Category:
Target version:


 >> a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace 
=> "?")
 >> a.valid_encoding?
 => true
 >> a
 => " \xE9 "
 >> a.squeeze
 ArgumentError: invalid byte sequence in UTF-8
   from (irb):32:in `squeeze'
   from (irb):32
   from /usr/bin/irb:12:in `<main>'

The expected string is " ? ", as the documentation for the ":replace" 
options says :
If the value is :replace, encode replaces invalid byte sequences in str 
with the replacement character.
Posted by zzak (Zachary Scott) (Guest)
on 2013-02-23 21:27
(Received via mailing list)
Issue #6190 has been updated by zzak (Zachary Scott).

Assignee set to naruse (Yui NARUSE)

naruse-san what do you want for this ticket?
----------------------------------------
Backport #6190: String#encode return string containing invalid chars but 
marked as valid
https://bugs.ruby-lang.org/issues/6190#change-36835

Author: pplr (Pierre PLR)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version:


 >> a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace 
=> "?")
 >> a.valid_encoding?
 => true
 >> a
 => " \xE9 "
 >> a.squeeze
 ArgumentError: invalid byte sequence in UTF-8
   from (irb):32:in `squeeze'
   from (irb):32
   from /usr/bin/irb:12:in `<main>'

The expected string is " ? ", as the documentation for the ":replace" 
options says :
If the value is :replace, encode replaces invalid byte sequences in str 
with the replacement character.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.