Forum: Ruby-dev [ruby-trunk - Bug #7646][Open] String#each lineでinvalid byte sequence

Posted by yoshidam (Yoshida Masato) (Guest)
on 2013-01-02 17:44
(Received via mailing list)
Issue #7646 has been reported by yoshidam (Yoshida Masato).

----------------------------------------
Bug #7646: String#each_lineでinvalid byte sequence
https://bugs.ruby-lang.org/issues/7646

Author: yoshidam (Yoshida Masato)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]


=begin
String#each_lineでセパレータを指定したときにASCII以外の文字でinvalid byte sequenceが発生します。

 $ ruby -ve '"\n\u0100".each_line("\n") {|l| p l }'
 ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
 "\n"
 -e:1:in `each_line': invalid byte sequence in UTF-8 (ArgumentError)
 from -e:1:in `<main>'

r38616あたりの変更で入ったバグのようです。

  
 --- string.c.org        2012-12-27 21:57:07.000000000 +0900
 +++ string.c    2013-01-02 23:36:47.000000000 +0900
 @@ -6199,14 +6199,14 @@
         if (c == newline &&
             (rslen <= 1 ||
              (pend - p >= rslen && memcmp(RSTRING_PTR(rs), p, rslen) == 
0))) {
 -           p += (rslen ? rslen : n);
 -           line = rb_str_subseq(str, s - ptr, p - s);
 +           const char *pp = p + (rslen ? rslen : n);
 +           line = rb_str_subseq(str, s - ptr, pp - s);
             if (wantarray)
                 rb_ary_push(ary, line);
             else
                 rb_yield(line);
             str_mod_check(str, ptr, len);
 -           s = p;
 +           s = pp;
         }
         p += n;
      }

=end
Posted by kosaki (Motohiro KOSAKI) (Guest)
on 2013-01-02 18:07
(Received via mailing list)
Issue #7646 has been updated by kosaki (Motohiro KOSAKI).

Category set to core
Status changed from Open to Assigned
Assignee set to nobu (Nobuyoshi Nakada)
Priority changed from Normal to High
Target version set to 2.0.0

これはどうみても regressionじゃないかな。
2.0.0タグつけます。
----------------------------------------
Bug #7646: String#each_lineでinvalid byte sequence
https://bugs.ruby-lang.org/issues/7646#change-35181

Author: yoshidam (Yoshida Masato)
Status: Assigned
Priority: High
Assignee: nobu (Nobuyoshi Nakada)
Category: core
Target version: 2.0.0
ruby -v: ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]


=begin
String#each_lineでセパレータを指定したときにASCII以外の文字でinvalid byte sequenceが発生します。

 $ ruby -ve '"\n\u0100".each_line("\n") {|l| p l }'
 ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
 "\n"
 -e:1:in `each_line': invalid byte sequence in UTF-8 (ArgumentError)
 from -e:1:in `<main>'

r38616あたりの変更で入ったバグのようです。

  
 --- string.c.org        2012-12-27 21:57:07.000000000 +0900
 +++ string.c    2013-01-02 23:36:47.000000000 +0900
 @@ -6199,14 +6199,14 @@
         if (c == newline &&
             (rslen <= 1 ||
              (pend - p >= rslen && memcmp(RSTRING_PTR(rs), p, rslen) == 
0))) {
 -           p += (rslen ? rslen : n);
 -           line = rb_str_subseq(str, s - ptr, p - s);
 +           const char *pp = p + (rslen ? rslen : n);
 +           line = rb_str_subseq(str, s - ptr, pp - s);
             if (wantarray)
                 rb_ary_push(ary, line);
             else
                 rb_yield(line);
             str_mod_check(str, ptr, len);
 -           s = p;
 +           s = pp;
         }
         p += n;
      }

=end
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.