Forum: Ruby-core unpack() ignores default encoding when generating strings, always uses ASCII-8BIT

04d072ab8843cfd3d1714faf3a2a0fb2?d=identicon&s=25 unknown (Guest)
on 2014-08-16 00:13
(Received via mailing list)
Issue #10132 has been updated by mathew murphy.


The Ruby documentation says:

  M |  **String** | quoted printable, MIME encoding (see RFC2045)

And [RFC 2045](http://tools.ietf.org/html/rfc2045) section 6.7 says:

> The Quoted-Printable encoding is intended to represent data that **largely
consists of octets that correspond to printable characters** in the US-ASCII
character set.

So the Ruby documentation itself says that it's a string not binary
data, and it refers to an RFC that says the encoding is intended for
textual (printable) characters.

Perhaps you were thinking of base64?  I don't think I've ever seen
quoted-printable used for binary data.

----------------------------------------
Bug #10132: unpack() ignores default encoding when generating strings,
always uses ASCII-8BIT
https://bugs.ruby-lang.org/issues/10132#change-48363

* Author: mathew murphy
* Status: Rejected
* Priority: Normal
* Assignee:
* Category:
* Target version:
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
New strings are generated in the default encoding:

    irb> __ENCODING__.name
    => "UTF-8"
    irb> "ünicode".encoding.name
    => "UTF-8"

...but not if they're generated by unpack:

    irb> "ünicode".split.pack('M*').unpack('M*').first
    => "\xC3\xBCnicode"
    irb> "ünicode".split.pack('M*').unpack('M*').first.encoding.name
    => "ASCII-8BIT"

Workaround is to force the encoding on every string unpack generates:

    irb>
"ünicode".split.pack('M*').unpack('M*').first.force_encoding(__ENCODING__.name)
    => "ünicode"
This topic is locked and can not be replied to.