Forum: Ruby-core Array#pack and String#unpack documentation

316bc8bedc3ca4e066c7153dae21ddf6?d=identicon&s=25 Gary Wright (Guest)
on 2007-12-12 19:11
(Received via mailing list)
In working with Array#pack and String#unpack today, I found
the documentation for the a, A, and Z formats somewhat ambiguous.
The docs for Array#pack read:

 A     |  ASCII string (space padded, count is width)
 a     |  ASCII string (null padded, count is width)
 Z     |  Same as ``a'', except that null is added with *

The phrase 'ASCII string' is the part that confused me.  It
looks like the contents of the string is irrelavant--there
is no requirement that the string only contain ASCII values.
For A, the padded space will be an ASCII space though.  For
'a' and 'z', there doesn't seem to be any dependency on ASCII.

Here is a suggested patch against branches/ruby_1_8

Index: pack.c
===================================================================
--- pack.c      (revision 14122)
+++ pack.c      (working copy)
@@ -400,8 +400,8 @@
 *   Directive    Meaning
 *   ---------------------------------------------------------------
 *       @     |  Moves to absolute position
- *       A     |  ASCII string (space padded, count is width)
- *       a     |  ASCII string (null padded, count is width)
+ *       A     |  arbitrary binary string (ASCII space padded, count is
width)
+ *       a     |  arbitrary binary string (null padded, count is width)
 *       B     |  Bit string (descending bit order)
 *       b     |  Bit string (ascending bit order)
 *       C     |  Unsigned char
@@ -525,8 +525,8 @@

           switch (type) {
             case 'a':         /* arbitrary binary string (null padded)
*/
-             case 'A':         /* ASCII string (space padded) */
-             case 'Z':         /* null terminated ASCII string  */
+             case 'A':         /* arbitrary binary string (ASCII
space padded) */
+             case 'Z':         /* null terminated string  */
               if (plen >= len) {
                   rb_str_buf_cat(res, ptr, len);
                   if (p[-1] == '*' && type == 'Z')
@@ -1193,9 +1193,10 @@
 *
 *     Format | Returns | Function
 *     -------+---------+-----------------------------------------
- *       A    | String  | with trailing nulls and spaces removed
+ *       A    | String  | arbitrary binary string with trailing
+ *            |         | nulls and ASCII spaces removed
 *     -------+---------+-----------------------------------------
- *       a    | String  | string
+ *       a    | String  | arbitrary binary string
 *     -------+---------+-----------------------------------------
 *       B    | String  | extract bits from each character (msb first)
 *     -------+---------+-----------------------------------------
0ec4920185b657a03edf01fff96b4e9b?d=identicon&s=25 Yukihiro Matsumoto (Guest)
on 2007-12-12 19:21
(Received via mailing list)
Hi,

In message "Re: Array#pack and String#unpack documentation"
    on Tue, 11 Dec 2007 11:13:37 +0900, "Gary Wright"
<radar2002@gmail.com> writes:

|The phrase 'ASCII string' is the part that confused me.  It
|looks like the contents of the string is irrelavant--there
|is no requirement that the string only contain ASCII values.
|For A, the padded space will be an ASCII space though.  For
|'a' and 'z', there doesn't seem to be any dependency on ASCII.

OK, I will merge you patch to the trunk.  Thank you.

              matz.
E7559e558ececa67c40f452483b9ac8c?d=identicon&s=25 Gary Wright (Guest)
on 2007-12-12 19:53
(Received via mailing list)
On Dec 11, 2007, at 9:10 AM, Yukihiro Matsumoto wrote:
> OK, I will merge you patch to the trunk.  Thank you.
>
You're welcome.

I didn't generate the diff against trunk because I wasn't
entirely sure how the M17N work interacted with String#unpack and
Array#pack.

For example, should Array#pack always use an ASCII space for
the 'A' format or should it be using the space defined for the
current encoding?  What if the current encoding is multibyte
(e.g. UTF-16) and the space doesn't fit in the allotted width?

Anyway, that is why I stayed away from trunk, even though the
patch works just fine there.

Gary Wright
0ec4920185b657a03edf01fff96b4e9b?d=identicon&s=25 Yukihiro Matsumoto (Guest)
on 2007-12-12 22:10
(Received via mailing list)
Hi,

In message "Re: Array#pack and String#unpack documentation"
    on Wed, 12 Dec 2007 05:00:51 +0900, Gary Wright <gwtmp01@mac.com>
writes:

|I didn't generate the diff against trunk because I wasn't
|entirely sure how the M17N work interacted with String#unpack and
|Array#pack.

pack/unpack does not handle M17N at all.

|For example, should Array#pack always use an ASCII space for
|the 'A' format or should it be using the space defined for the
|current encoding?  What if the current encoding is multibyte
|(e.g. UTF-16) and the space doesn't fit in the allotted width?

Again, pack/unpack does not handle encoding, so UTF-16 is treated as
sequence of bytes.  Padding it with ASCII space would result in
garbage, but I consider it users' responsibility.

              matz.
E7559e558ececa67c40f452483b9ac8c?d=identicon&s=25 Gary Wright (Guest)
on 2007-12-12 23:06
(Received via mailing list)
On Dec 11, 2007, at 6:02 PM, Yukihiro Matsumoto wrote:
> In message "Re: Array#pack and String#unpack documentation"
>     on Wed, 12 Dec 2007 05:00:51 +0900, Gary Wright
> <gwtmp01@mac.com> writes:
>
> |I didn't generate the diff against trunk because I wasn't
> |entirely sure how the M17N work interacted with String#unpack and
> |Array#pack.
>
> pack/unpack does not handle M17N at all.

Good to know.

> Again, pack/unpack does not handle encoding, so UTF-16 is treated as
> sequence of bytes.  Padding it with ASCII space would result in
> garbage, but I consider it users' responsibility.


Yet the format string itself is assumed to be in ASCII, right?  To
be more specific, the encoding of the format string has to be
compatible with the encoding of the C source used to compile
ruby (because of the switch() on literal characters in pack.c).

Is it safe to assume that the interpreter itself is compiled in an
ASCII compatible encoding?

Sorry if these are basic questions, I just haven't thought through
these interactions before.

Gary Wright
0ec4920185b657a03edf01fff96b4e9b?d=identicon&s=25 Yukihiro Matsumoto (Guest)
on 2007-12-12 23:31
(Received via mailing list)
Hi,

In message "Re: Array#pack and String#unpack documentation"
    on Wed, 12 Dec 2007 08:53:22 +0900, Gary Wright <gwtmp01@mac.com>
writes:

|Yet the format string itself is assumed to be in ASCII, right?  To
|be more specific, the encoding of the format string has to be
|compatible with the encoding of the C source used to compile
|ruby (because of the switch() on literal characters in pack.c).

Yes.

|Is it safe to assume that the interpreter itself is compiled in an
|ASCII compatible encoding?

Yes.  Ruby will not handle ASCII incompatible encoding for scripts and
literals, unless someone will come up with drastic innovation I cannot
imagine right now.

              matz.
A131b672fdbd2a58dce12031ad78b121?d=identicon&s=25 Wolfgang Nádasi-Donner (wonado)
on 2007-12-13 00:12
(Received via mailing list)
Yukihiro Matsumoto schrieb:
> pack/unpack does not handle M17N at all.

Really? - What's about directive "U" for "UTF-8"?

Wolfgang Nádasi-Donner
E7559e558ececa67c40f452483b9ac8c?d=identicon&s=25 Gary Wright (Guest)
on 2007-12-13 03:05
(Received via mailing list)
On Dec 11, 2007, at 9:14 PM, Yukihiro Matsumoto wrote:
> |Is it safe to assume that the interpreter itself is compiled in an
> |ASCII compatible encoding?
>
> Yes.  Ruby will not handle ASCII incompatible encoding for scripts and
> literals, unless someone will come up with drastic innovation I cannot
> imagine right now.

Thank you for all these clarifications.  Other than the source are
there any resources to help learn about all the ramifications of M17N?

Gary Wright
0ec4920185b657a03edf01fff96b4e9b?d=identicon&s=25 Yukihiro Matsumoto (Guest)
on 2007-12-13 03:41
(Received via mailing list)
Hi,

In message "Re: Array#pack and String#unpack documentation"
    on Wed, 12 Dec 2007 09:41:27 +0900, Wolfgang Nádasi-Donner
<ed.odanow@wonado.de> writes:

|Yukihiro Matsumoto schrieb:
|> pack/unpack does not handle M17N at all.
|
|Really? - What's about directive "U" for "UTF-8"?

It works on UTF-8 byte sequences, but it does not change the encoding
of the resulting string.

              matz.
This topic is locked and can not be replied to.