[ruby-trunk - Bug #5635][Open] String#unpack("M") の不正データ時の振る舞い

dubstep · November 15, 2011, 7:52am

Issue #5635 has been reported by Masahiro T…

Bug #5635: String#unpack(“M”) の不正データ時の振る舞い

Author: Masahiro T.
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-11-08 revision 33661) [i686-linux]

String#unpack(“M”) で “=hoge” みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。

RFC2045 には次のような記述があります。

(2)   An "=" followed by a character that is neither a
      hexadecimal digit (including "abcdef") nor the CR
      character of a CRLF pair is illegal.  This case can be
      the result of US-ASCII text having been included in a
      quoted-printable part of a message without itself
      having been subjected to quoted-printable encoding.  A
      reasonable approach by a robust implementation might be
      to include the "=" character and the following
      character in the decoded data without any
      transformation and, if possible, indicate to the user
      that proper decoding was not possible at this point in
      the data.

Index: pack.c

— pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@

 while (s < send) {
     if (*s == '=') {

```
 if (++s == send) break;
```

                  if (s+1 < send && *s == '\r' && *(s+1) == '\n')

```
                    s++;
```
```
 if (*s != '\n') {
```

     if ((c1 = hex2num(*s)) == -1) break;

```
     if (++s == send) break;
```

     if ((c2 = hex2num(*s)) == -1) break;

```
     *ptr++ = c1 << 4 | c2;
```

```
 if (s+1 < send && *(s+1) == '\n') {
```
```
     s += 2;
```
```
     continue;
```
```
 }
```
```
 if (s+2 < send) {
```

     if (*(s+1) == '\r' && *(s+2) == '\n') {

```
   s += 3;
```
```
   continue;
```
```
     }
```

     if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) >

-1) {

```
   *ptr++ = c1 << 4 | c2;
```
```
   s += 3;
```
```
   continue;
```
```
     }
 }
   }
```

```
   else {
```
```
 *ptr++ = *s;
```
```
   }
```
```
   s++;
```

```
   *ptr++ = *s++;
```
}
rb_str_set_len(buf, ptr - RSTRING_PTR(buf));
ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(),
ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb
===================================================================
— test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], “\220\200\200\200\000”.unpack(“w”),
[0x100000000])
end
def test_pack_unpack_M
assert_equal([“pre123after”], “pre=31=32=33after”.unpack(“M”))
assert_equal([“preafter”], “pre=\nafter”.unpack(“M”))
assert_equal([“preafter”], “pre=\r\nafter”.unpack(“M”))
assert_equal([“pre=”], “pre=”.unpack(“M”))
assert_equal([“pre=\r”], “pre=\r”.unpack(“M”))
assert_equal([“pre=hoge”], “pre=hoge”.unpack(“M”))
assert_equal([“pre=1after”], “pre==31after”.unpack(“M”))
assert_equal([“pre==1after”], “pre===31after”.unpack(“M”))
end
def test_modify_under_safe4
s = “foo”
assert_raise(SecurityError) do

masa16 · November 15, 2011, 2:08pm

Issue #5635 has been updated by Yui NARUSE.

うーん、RFCに変換せずにそのままくっつけとけとあるのでしたらその通りにした方がいいように思うのですが

Bug #5635: String#unpack(“M”) の不正データ時の振る舞い

Author: Masahiro T.
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-11-08 revision 33661) [i686-linux]

String#unpack(“M”) で “=hoge” みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。

RFC2045 には次のような記述があります。

(2)   An "=" followed by a character that is neither a
      hexadecimal digit (including "abcdef") nor the CR
      character of a CRLF pair is illegal.  This case can be
      the result of US-ASCII text having been included in a
      quoted-printable part of a message without itself
      having been subjected to quoted-printable encoding.  A
      reasonable approach by a robust implementation might be
      to include the "=" character and the following
      character in the decoded data without any
      transformation and, if possible, indicate to the user
      that proper decoding was not possible at this point in
      the data.

Index: pack.c

— pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@

 while (s < send) {
     if (*s == '=') {

```
 if (++s == send) break;
```

                  if (s+1 < send && *s == '\r' && *(s+1) == '\n')

```
                    s++;
```
```
 if (*s != '\n') {
```

     if ((c1 = hex2num(*s)) == -1) break;

```
     if (++s == send) break;
```

     if ((c2 = hex2num(*s)) == -1) break;

```
     *ptr++ = c1 << 4 | c2;
```

```
 if (s+1 < send && *(s+1) == '\n') {
```
```
     s += 2;
```
```
     continue;
```
```
 }
```
```
 if (s+2 < send) {
```

     if (*(s+1) == '\r' && *(s+2) == '\n') {

```
   s += 3;
```
```
   continue;
```
```
     }
```

     if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) >

-1) {

```
   *ptr++ = c1 << 4 | c2;
```
```
   s += 3;
```
```
   continue;
```
```
     }
 }
   }
```

```
   else {
```
```
 *ptr++ = *s;
```
```
   }
```
```
   s++;
```

```
   *ptr++ = *s++;
```
}
rb_str_set_len(buf, ptr - RSTRING_PTR(buf));
ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(),
ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb
===================================================================
— test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], “\220\200\200\200\000”.unpack(“w”),
[0x100000000])
end
def test_pack_unpack_M
assert_equal([“pre123after”], “pre=31=32=33after”.unpack(“M”))
assert_equal([“preafter”], “pre=\nafter”.unpack(“M”))
assert_equal([“preafter”], “pre=\r\nafter”.unpack(“M”))
assert_equal([“pre=”], “pre=”.unpack(“M”))
assert_equal([“pre=\r”], “pre=\r”.unpack(“M”))
assert_equal([“pre=hoge”], “pre=hoge”.unpack(“M”))
assert_equal([“pre=1after”], “pre==31after”.unpack(“M”))
assert_equal([“pre==1after”], “pre===31after”.unpack(“M”))
end
def test_modify_under_safe4
s = “foo”
assert_raise(SecurityError) do

masa16 · November 15, 2011, 3:13pm

Issue #5635 has been updated by Masahiro T…

あれ？不正なデータについては、そのままにしてるつもりなのですが。
少なくとも今の不正なデータ以降全部削ってしまう動きよりはいいんじゃないかと…。

Bug #5635: String#unpack(“M”) の不正データ時の振る舞い

Author: Masahiro T.
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-11-08 revision 33661) [i686-linux]

String#unpack(“M”) で “=hoge” みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。

RFC2045 には次のような記述があります。

(2)   An "=" followed by a character that is neither a
      hexadecimal digit (including "abcdef") nor the CR
      character of a CRLF pair is illegal.  This case can be
      the result of US-ASCII text having been included in a
      quoted-printable part of a message without itself
      having been subjected to quoted-printable encoding.  A
      reasonable approach by a robust implementation might be
      to include the "=" character and the following
      character in the decoded data without any
      transformation and, if possible, indicate to the user
      that proper decoding was not possible at this point in
      the data.

Index: pack.c

— pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@

 while (s < send) {
     if (*s == '=') {

```
 if (++s == send) break;
```

                  if (s+1 < send && *s == '\r' && *(s+1) == '\n')

```
                    s++;
```
```
 if (*s != '\n') {
```

     if ((c1 = hex2num(*s)) == -1) break;

```
     if (++s == send) break;
```

     if ((c2 = hex2num(*s)) == -1) break;

```
     *ptr++ = c1 << 4 | c2;
```

```
 if (s+1 < send && *(s+1) == '\n') {
```
```
     s += 2;
```
```
     continue;
```
```
 }
```
```
 if (s+2 < send) {
```

     if (*(s+1) == '\r' && *(s+2) == '\n') {

```
   s += 3;
```
```
   continue;
```
```
     }
```

     if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) >

-1) {

```
   *ptr++ = c1 << 4 | c2;
```
```
   s += 3;
```
```
   continue;
```
```
     }
 }
   }
```

```
   else {
```
```
 *ptr++ = *s;
```
```
   }
```
```
   s++;
```

```
   *ptr++ = *s++;
```
}
rb_str_set_len(buf, ptr - RSTRING_PTR(buf));
ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(),
ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb
===================================================================
— test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], “\220\200\200\200\000”.unpack(“w”),
[0x100000000])
end
def test_pack_unpack_M
assert_equal([“pre123after”], “pre=31=32=33after”.unpack(“M”))
assert_equal([“preafter”], “pre=\nafter”.unpack(“M”))
assert_equal([“preafter”], “pre=\r\nafter”.unpack(“M”))
assert_equal([“pre=”], “pre=”.unpack(“M”))
assert_equal([“pre=\r”], “pre=\r”.unpack(“M”))
assert_equal([“pre=hoge”], “pre=hoge”.unpack(“M”))
assert_equal([“pre=1after”], “pre==31after”.unpack(“M”))
assert_equal([“pre==1after”], “pre===31after”.unpack(“M”))
end
def test_modify_under_safe4
s = “foo”
assert_raise(SecurityError) do

masa16 · December 9, 2011, 1:45am

Issue #5635 has been updated by Masahiro T…

あ～、なるほど、不正なデータに遭遇したら、それ以降のデータは一切変換するな…と。
確かにそっちの方がいいような気がします。

Bug #5635: String#unpack(“M”) の不正データ時の振る舞い

Author: Masahiro T.
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-11-08 revision 33661) [i686-linux]

String#unpack(“M”) で “=hoge” みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。

RFC2045 には次のような記述があります。

(2)   An "=" followed by a character that is neither a
      hexadecimal digit (including "abcdef") nor the CR
      character of a CRLF pair is illegal.  This case can be
      the result of US-ASCII text having been included in a
      quoted-printable part of a message without itself
      having been subjected to quoted-printable encoding.  A
      reasonable approach by a robust implementation might be
      to include the "=" character and the following
      character in the decoded data without any
      transformation and, if possible, indicate to the user
      that proper decoding was not possible at this point in
      the data.

Index: pack.c

— pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@

 while (s < send) {
     if (*s == '=') {

```
 if (++s == send) break;
```

                  if (s+1 < send && *s == '\r' && *(s+1) == '\n')

```
                    s++;
```
```
 if (*s != '\n') {
```

     if ((c1 = hex2num(*s)) == -1) break;

```
     if (++s == send) break;
```

     if ((c2 = hex2num(*s)) == -1) break;

```
     *ptr++ = c1 << 4 | c2;
```

```
 if (s+1 < send && *(s+1) == '\n') {
```
```
     s += 2;
```
```
     continue;
```
```
 }
```
```
 if (s+2 < send) {
```

     if (*(s+1) == '\r' && *(s+2) == '\n') {

```
   s += 3;
```
```
   continue;
```
```
     }
```

     if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) >

-1) {

```
   *ptr++ = c1 << 4 | c2;
```
```
   s += 3;
```
```
   continue;
```
```
     }
 }
   }
```

```
   else {
```
```
 *ptr++ = *s;
```
```
   }
```
```
   s++;
```

```
   *ptr++ = *s++;
```
}
rb_str_set_len(buf, ptr - RSTRING_PTR(buf));
ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(),
ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb
===================================================================
— test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], “\220\200\200\200\000”.unpack(“w”),
[0x100000000])
end
def test_pack_unpack_M
assert_equal([“pre123after”], “pre=31=32=33after”.unpack(“M”))
assert_equal([“preafter”], “pre=\nafter”.unpack(“M”))
assert_equal([“preafter”], “pre=\r\nafter”.unpack(“M”))
assert_equal([“pre=”], “pre=”.unpack(“M”))
assert_equal([“pre=\r”], “pre=\r”.unpack(“M”))
assert_equal([“pre=hoge”], “pre=hoge”.unpack(“M”))
assert_equal([“pre=1after”], “pre==31after”.unpack(“M”))
assert_equal([“pre==1after”], “pre===31after”.unpack(“M”))
end
def test_modify_under_safe4
s = “foo”
assert_raise(SecurityError) do

masa16 · November 29, 2011, 9:10pm

Issue #5635 has been updated by Yui NARUSE.

わたしには処理を継続しないと読めるんですがどうなんでしょう。
まぁ、RFCよりもこの手の通信系は長いものに巻かれるのが正しい気もするので、他の実装の例でも。

Bug #5635: String#unpack(“M”) の不正データ時の振る舞い

Author: Masahiro T.
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-11-08 revision 33661) [i686-linux]

String#unpack(“M”) で “=hoge” みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。

RFC2045 には次のような記述があります。

(2)   An "=" followed by a character that is neither a
      hexadecimal digit (including "abcdef") nor the CR
      character of a CRLF pair is illegal.  This case can be
      the result of US-ASCII text having been included in a
      quoted-printable part of a message without itself
      having been subjected to quoted-printable encoding.  A
      reasonable approach by a robust implementation might be
      to include the "=" character and the following
      character in the decoded data without any
      transformation and, if possible, indicate to the user
      that proper decoding was not possible at this point in
      the data.

Index: pack.c

— pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@

 while (s < send) {
     if (*s == '=') {

```
 if (++s == send) break;
```

                  if (s+1 < send && *s == '\r' && *(s+1) == '\n')

```
                    s++;
```
```
 if (*s != '\n') {
```

     if ((c1 = hex2num(*s)) == -1) break;

```
     if (++s == send) break;
```

     if ((c2 = hex2num(*s)) == -1) break;

```
     *ptr++ = c1 << 4 | c2;
```

```
 if (s+1 < send && *(s+1) == '\n') {
```
```
     s += 2;
```
```
     continue;
```
```
 }
```
```
 if (s+2 < send) {
```

     if (*(s+1) == '\r' && *(s+2) == '\n') {

```
   s += 3;
```
```
   continue;
```
```
     }
```

     if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) >

-1) {

```
   *ptr++ = c1 << 4 | c2;
```
```
   s += 3;
```
```
   continue;
```
```
     }
 }
   }
```

```
   else {
```
```
 *ptr++ = *s;
```
```
   }
```
```
   s++;
```

```
   *ptr++ = *s++;
```
}
rb_str_set_len(buf, ptr - RSTRING_PTR(buf));
ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(),
ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb
===================================================================
— test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], “\220\200\200\200\000”.unpack(“w”),
[0x100000000])
end
def test_pack_unpack_M
assert_equal([“pre123after”], “pre=31=32=33after”.unpack(“M”))
assert_equal([“preafter”], “pre=\nafter”.unpack(“M”))
assert_equal([“preafter”], “pre=\r\nafter”.unpack(“M”))
assert_equal([“pre=”], “pre=”.unpack(“M”))
assert_equal([“pre=\r”], “pre=\r”.unpack(“M”))
assert_equal([“pre=hoge”], “pre=hoge”.unpack(“M”))
assert_equal([“pre=1after”], “pre==31after”.unpack(“M”))
assert_equal([“pre==1after”], “pre===31after”.unpack(“M”))
end
def test_modify_under_safe4
s = “foo”
assert_raise(SecurityError) do

masa16 · March 11, 2012, 8:50am

Issue #5635 has been updated by Koichi Sasada.

Assignee set to Yui NARUSE

Bug #5635: String#unpack(“M”) の不正データ時の振る舞い

Author: Masahiro T.
Status: Open
Priority: Normal
Assignee: Yui NARUSE
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-11-08 revision 33661) [i686-linux]

String#unpack(“M”) で “=hoge” みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。

RFC2045 には次のような記述があります。

(2)   An "=" followed by a character that is neither a
      hexadecimal digit (including "abcdef") nor the CR
      character of a CRLF pair is illegal.  This case can be
      the result of US-ASCII text having been included in a
      quoted-printable part of a message without itself
      having been subjected to quoted-printable encoding.  A
      reasonable approach by a robust implementation might be
      to include the "=" character and the following
      character in the decoded data without any
      transformation and, if possible, indicate to the user
      that proper decoding was not possible at this point in
      the data.

Index: pack.c

— pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@

 while (s < send) {
     if (*s == '=') {

```
 if (++s == send) break;
```

                  if (s+1 < send && *s == '\r' && *(s+1) == '\n')

```
                    s++;
```
```
 if (*s != '\n') {
```

     if ((c1 = hex2num(*s)) == -1) break;

```
     if (++s == send) break;
```

     if ((c2 = hex2num(*s)) == -1) break;

```
     *ptr++ = c1 << 4 | c2;
```

```
 if (s+1 < send && *(s+1) == '\n') {
```
```
     s += 2;
```
```
     continue;
```
```
 }
```
```
 if (s+2 < send) {
```

     if (*(s+1) == '\r' && *(s+2) == '\n') {

```
   s += 3;
```
```
   continue;
```
```
     }
```

     if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) >

-1) {

```
   *ptr++ = c1 << 4 | c2;
```
```
   s += 3;
```
```
   continue;
```
```
     }
 }
   }
```

```
   else {
```
```
 *ptr++ = *s;
```
```
   }
```
```
   s++;
```

```
   *ptr++ = *s++;
```
}
rb_str_set_len(buf, ptr - RSTRING_PTR(buf));
ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(),
ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb
===================================================================
— test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], “\220\200\200\200\000”.unpack(“w”),
[0x100000000])
end
def test_pack_unpack_M
assert_equal([“pre123after”], “pre=31=32=33after”.unpack(“M”))
assert_equal([“preafter”], “pre=\nafter”.unpack(“M”))
assert_equal([“preafter”], “pre=\r\nafter”.unpack(“M”))
assert_equal([“pre=”], “pre=”.unpack(“M”))
assert_equal([“pre=\r”], “pre=\r”.unpack(“M”))
assert_equal([“pre=hoge”], “pre=hoge”.unpack(“M”))
assert_equal([“pre=1after”], “pre==31after”.unpack(“M”))
assert_equal([“pre==1after”], “pre===31after”.unpack(“M”))
end
def test_modify_under_safe4
s = “foo”
assert_raise(SecurityError) do

[ruby-trunk - Bug #5635][Open] String#unpack("M") の不正データ時の振る舞い

Index: pack.c

うーん、RFCに変換せずにそのままくっつけとけとあるのでしたらその通りにした方がいいように思うのですが

Index: pack.c

あれ？ 不正なデータについては、そのままにしてるつもりなのですが。 少なくとも今の不正なデータ以降全部削ってしまう動きよりはいいんじゃないかと…。

Index: pack.c

あ～、なるほど、不正なデータに遭遇したら、それ以降のデータは一切変換するな…と。 確かにそっちの方がいいような気がします。

Index: pack.c

わたしには処理を継続しないと読めるんですがどうなんでしょう。 まぁ、RFCよりもこの手の通信系は長いものに巻かれるのが正しい気もするので、他の実装の例でも。

Index: pack.c

Index: pack.c

あれ？不正なデータについては、そのままにしてるつもりなのですが。
少なくとも今の不正なデータ以降全部削ってしまう動きよりはいいんじゃないかと…。

あ～、なるほど、不正なデータに遭遇したら、それ以降のデータは一切変換するな…と。
確かにそっちの方がいいような気がします。

わたしには処理を継続しないと読めるんですがどうなんでしょう。
まぁ、RFCよりもこの手の通信系は長いものに巻かれるのが正しい気もするので、他の実装の例でも。