Forum: Ruby-dev [ruby-trunk - Bug #7892][Open] MIME encoding bug of NKF.nkf

Posted by mrkn (Kenta Murata) (Guest)
on 2013-02-20 08:02
(Received via mailing list)
Issue #7892 has been reported by mrkn (Kenta Murata).

----------------------------------------
Bug #7892: MIME encoding bug of NKF.nkf
https://bugs.ruby-lang.org/issues/7892

Author: mrkn (Kenta Murata)
Status: Open
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: ext
Target version: 2.0.0
ruby -v: ruby 2.0.0dev (2013-02-08 trunk 39161) [x86_64-darwin11.4.2]


NKF の MIME encoding の結果が 1.8 と 1.9/2.0 で異なってます。

# 1.8 の場合
$ /usr/bin/ruby -rnkf -ve "puts NKF.nkf('-jW -M --cp932', '「あああああああああああ 
by ああああああああああ」のレシピ')"
ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin11.0]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=

# 1.9.3-p385 の場合
$ ruby -rnkf -ve "puts NKF.nkf('-jW -M --cp932', '「あああああああああああ by 
ああああああああああ」のレシピ')"
ruby 1.9.3p385 (2013-02-06 revision 39114) [x86_64-darwin11.4.2]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?US-ASCII?Q??=
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=

# 2.0.0-rc2 の場合
$ RBENV_VERSION=2.0.0-rc2 rbenv exec ruby -rnkf -ve "puts NKF.nkf('-jW 
-M --cp932', '「あああああああああああ by ああああああああああ」のレシピ')"
ruby 2.0.0dev (2013-02-08 trunk 39161) [x86_64-darwin11.4.2]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?US-ASCII?Q??=
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=
Posted by mame (Yusuke Endoh) (Guest)
on 2013-02-20 08:26
(Received via mailing list)
Issue #7892 has been updated by mame (Yusuke Endoh).

Target version changed from 2.0.0 to next minor


----------------------------------------
Bug #7892: MIME encoding bug of NKF.nkf
https://bugs.ruby-lang.org/issues/7892#change-36660

Author: mrkn (Kenta Murata)
Status: Open
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: ext
Target version: next minor
ruby -v: ruby 2.0.0dev (2013-02-08 trunk 39161) [x86_64-darwin11.4.2]


NKF の MIME encoding の結果が 1.8 と 1.9/2.0 で異なってます。

# 1.8 の場合
$ /usr/bin/ruby -rnkf -ve "puts NKF.nkf('-jW -M --cp932', '「あああああああああああ 
by ああああああああああ」のレシピ')"
ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin11.0]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=

# 1.9.3-p385 の場合
$ ruby -rnkf -ve "puts NKF.nkf('-jW -M --cp932', '「あああああああああああ by 
ああああああああああ」のレシピ')"
ruby 1.9.3p385 (2013-02-06 revision 39114) [x86_64-darwin11.4.2]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?US-ASCII?Q??=
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=

# 2.0.0-rc2 の場合
$ RBENV_VERSION=2.0.0-rc2 rbenv exec ruby -rnkf -ve "puts NKF.nkf('-jW 
-M --cp932', '「あああああああああああ by ああああああああああ」のレシピ')"
ruby 2.0.0dev (2013-02-08 trunk 39161) [x86_64-darwin11.4.2]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?US-ASCII?Q??=
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=
Posted by naruse (Yui NARUSE) (Guest)
on 2013-02-20 10:30
(Received via mailing list)
Issue #7892 has been updated by naruse (Yui NARUSE).


うーん……。

まず。どちらも元に戻らないので、バグです。
で、バグの根本的な原因は、「長い日本語 alphabets 日本語」という文字列をエンコードした際に、
「=?ISO-2022-JP?B?blahblah?= alphabets 
=?ISO-2022-JP?B?blah?=」とデコードしたいところ、
「=?ISO-2022-JP?B?blahblah?= 
alphabets<改行>=?ISO-2022-JP?B?blah?=」となってしまうケースを
想定していなかったから、ですね。

後述のパッチで直るような気もするんですが、根本的に改行戦略の考慮が足りてないので、
まじめにアルゴリズムを検討してとりあえずRubyで実装したいところですが、正直 MIME encode ってこれからも使います?

diff --git a/nkf.c b/nkf.c
index 705fb55..d3fde19 100644
--- a/nkf.c
+++ b/nkf.c
@@ -5421,28 +5421,6 @@ mime_putc(nkf_char c)
                mimeout_state.buf[mimeout_state.count++] = (char)c;
                return;
            }
-           if (nkf_isspace(c)) {
-               for (i=0;i<mimeout_state.count;i++) {
-                   if (SP<mimeout_state.buf[i] && 
mimeout_state.buf[i]<DEL) {
-                       eof_mime();
-                       for (i=0;i<mimeout_state.count;i++) {
-                           (*o_mputc)(mimeout_state.buf[i]);
-                           base64_count++;
-                       }
-                       mimeout_state.count = 0;
-                   }
-               }
-               mimeout_state.buf[mimeout_state.count++] = (char)c;
-               if (mimeout_state.count>MIMEOUT_BUF_LENGTH) {
-                   eof_mime();
-                   for (i=0;i<mimeout_state.count;i++) {
-                       (*o_mputc)(mimeout_state.buf[i]);
-                       base64_count++;
-                   }
-                   mimeout_state.count = 0;
-               }
-               return;
-           }
            if (mimeout_state.count>0 && SP<c && c!='=') {
                mimeout_state.buf[mimeout_state.count++] = (char)c;
                if (mimeout_state.count>MIMEOUT_BUF_LENGTH) {

----------------------------------------
Bug #7892: MIME encoding bug of NKF.nkf
https://bugs.ruby-lang.org/issues/7892#change-36666

Author: mrkn (Kenta Murata)
Status: Open
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: ext
Target version: next minor
ruby -v: ruby 2.0.0dev (2013-02-08 trunk 39161) [x86_64-darwin11.4.2]


NKF の MIME encoding の結果が 1.8 と 1.9/2.0 で異なってます。

# 1.8 の場合
$ /usr/bin/ruby -rnkf -ve "puts NKF.nkf('-jW -M --cp932', '「あああああああああああ 
by ああああああああああ」のレシピ')"
ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin11.0]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=

# 1.9.3-p385 の場合
$ ruby -rnkf -ve "puts NKF.nkf('-jW -M --cp932', '「あああああああああああ by 
ああああああああああ」のレシピ')"
ruby 1.9.3p385 (2013-02-06 revision 39114) [x86_64-darwin11.4.2]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?US-ASCII?Q??=
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=

# 2.0.0-rc2 の場合
$ RBENV_VERSION=2.0.0-rc2 rbenv exec ruby -rnkf -ve "puts NKF.nkf('-jW 
-M --cp932', '「あああああああああああ by ああああああああああ」のレシピ')"
ruby 2.0.0dev (2013-02-08 trunk 39161) [x86_64-darwin11.4.2]
=?ISO-2022-JP?B?GyRCIVYkIiQiJCIkIiQiJCIkIiQiJCIkIiQiGyhC?= by
 =?US-ASCII?Q??=
 =?ISO-2022-JP?B?GyRCJCIkIiQiJCIkIiQiJCIkIiQiJCIhVyROJWwlNyVUGyhC?=
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.