Forum: Ruby-core [ruby-trunk - Bug #7200][Open] Setting external encoding with BOM|

Posted by Brian Ford (brixen)
on 2012-10-21 04:55
(Received via mailing list)
Issue #7200 has been reported by brixen (Brian Ford).

----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200

Author: brixen (Brian Ford)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by mame (Yusuke Endoh) (Guest)
on 2012-11-05 13:58
(Received via mailing list)
Issue #7200 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to naruse (Yui NARUSE)
Target version set to 2.0.0

Naruse-san, could you handle this?

--
Yusuke Endoh <mame@tsg.ne.jp>
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32410

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by naruse (Yui NARUSE) (Guest)
on 2012-11-07 09:05
(Received via mailing list)
Issue #7200 has been updated by naruse (Yui NARUSE).

Status changed from Assigned to Rejected

BOM| specifier is available only on mode_enc.
:encoding of open and set_encoding(mode_enc) handles mode_enc,
but :external_encoding of open and set_encoding(ext, int) handles 
encodings.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32546

Author: brixen (Brian Ford)
Status: Rejected
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by shyouhei (Shyouhei Urabe) (Guest)
on 2012-11-07 18:55
(Received via mailing list)
Issue #7200 has been updated by shyouhei (Shyouhei Urabe).

Status changed from Rejected to Assigned

Yui that's _how_ it works, not _why_ it should be rejected.

If you want to reject this, write why.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32575

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by naruse (Yui NARUSE) (Guest)
on 2012-11-07 19:41
(Received via mailing list)
Issue #7200 has been updated by naruse (Yui NARUSE).

Status changed from Assigned to Rejected

I meant it is why.
A mode_enc and an encoding are different thing in syntax, implementation 
and meaning.

BOM|UTF-* is not the name of an encoding, but it is a part of mode 
specifier.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32582

Author: brixen (Brian Ford)
Status: Rejected
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by shyouhei (Shyouhei Urabe) (Guest)
on 2012-11-07 21:02
(Received via mailing list)
Issue #7200 has been updated by shyouhei (Shyouhei Urabe).

Status changed from Rejected to Assigned

naruse (Yui NARUSE) wrote:
> I meant it is why.
> A mode_enc and an encoding are different thing in syntax, implementation and 
meaning.
>
> BOM|UTF-* is not the name of an encoding, but it is a part of mode specifier.

That's OK.

But #set_encoding is confusing.  Or inconsistent at least.  Because it 
sets either encoding or mode depending on its arguments.  Should we 
separate that method into two, like #set_mode and #set_encoding ?
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32589

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by shyouhei (Shyouhei Urabe) (Guest)
on 2012-11-07 21:12
(Received via mailing list)
Issue #7200 has been updated by shyouhei (Shyouhei Urabe).


どうも伝わってないぽいので日本語で書きますけど、貴方報告者の問題を解決する気ないでしょ。

報告者の問題は何だったかを読みかえしていただけますか。それで、なぜこれが問題ではないのかを解説していただけますか。
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32590

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by naruse (Yui NARUSE) (Guest)
on 2012-11-08 03:17
(Received via mailing list)
Issue #7200 has been updated by naruse (Yui NARUSE).


shyouhei (Shyouhei Urabe) wrote:
> どうも伝わってないぽいので日本語で書きますけど、貴方報告者の問題を解決する気ないでしょ。
>
> 報告者の問題は何だったかを読みかえしていただけますか。それで、なぜこれが問題ではないのかを解説していただけますか。

Brian だから現実の問題ではないと認識しています。
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32605

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by shyouhei (Shyouhei Urabe) (Guest)
on 2012-11-08 04:22
(Received via mailing list)
Issue #7200 has been updated by shyouhei (Shyouhei Urabe).


naruse (Yui NARUSE) wrote:
> shyouhei (Shyouhei Urabe) wrote:
> > どうも伝わってないぽいので日本語で書きますけど、貴方報告者の問題を解決する気ないでしょ。
> >
> > 報告者の問題は何だったかを読みかえしていただけますか。それで、なぜこれが問題ではないのかを解説していただけますか。
>
> Brian だから現実の問題ではないと認識しています。

How dare you.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32607

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by shyouhei (Shyouhei Urabe) (Guest)
on 2012-11-08 04:31
(Received via mailing list)
Issue #7200 has been updated by shyouhei (Shyouhei Urabe).


So yui says this issue is illustrative because it was reported by Brian. 
What a ...

I feel very sorry, Brian.  I can do nothing anymore.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32608

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by "duerst (Martin Dürst)" <duerst@it.aoyama.ac.jp> (Guest)
on 2012-11-08 04:52
(Received via mailing list)
Issue #7200 has been updated by duerst (Martin Dürst).


Brian (or others),

[written in part to help Shouhei a bit]

Do you have an actual use case where you need something like
   f.set_encoding "bom|utf-16be", "euc-jp"
If yes, can you explain?

The current behavior in in part influenced by implementation. But there 
is also a conceptual issue, because "bom|" only applies at the start of 
the file, and may have different implications for input (check for a 
BOM) and output (add a BOM). So we have to carefully think what's the 
best way to make this easy for programmers to use the right way.

Regards,    Martin.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32609

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by naruse (Yui NARUSE) (Guest)
on 2012-11-08 06:36
(Received via mailing list)
Issue #7200 has been updated by naruse (Yui NARUSE).


shyouhei (Shyouhei Urabe) wrote:
> So yui says this issue is illustrative because it was reported by Brian.  What a 
...
>
> I feel very sorry, Brian.  I can do nothing anymore.

Don't do FUD.

Brian said they are inconsistent even if mode_enc looks like encoding.
I showed the reason why it is: because they are different and they take 
different type of arguments.

If Brian is not satisfied the reason and has an better idea, he should 
show it with actual use case.
I thought Brian create this ticket with Rubinius/RubySpec interest, and 
it should be reasonable because they are no use case.
I criticize imagining his fictional desire and blaming me.

duerst (Martin Dürst) wrote:
> The current behavior in in part influenced by implementation. But there is also 
a conceptual issue, because "bom|" only applies at the start of the file, and may 
have different implications for input (check for a BOM) and output (add a BOM). So 
we have to carefully think what's the best way to make this easy for programmers 
to use the right way.

Mainly it is conceptual.
This BOM|UTF-* specifier has two main function:
* skip U+FEFF at the beginning of the file
* set the external encoding with seeing the BOM
Such behavior is considered a derivative of mode, and it is not 
encoding.
Because of it is not an encoding, they can't be used in the context of 
encodings.

See also http://bugs.ruby-lang.org/issues/1951 and related tickets.
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-32616

Author: brixen (Brian Ford)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by naruse (Yui NARUSE) (Guest)
on 2013-02-17 11:51
(Received via mailing list)
Issue #7200 has been updated by naruse (Yui NARUSE).

Status changed from Assigned to Rejected


----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-36425

Author: brixen (Brian Ford)
Status: Rejected
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Posted by Brian Ford (brixen)
on 2013-02-20 19:17
(Received via mailing list)
Issue #7200 has been updated by brixen (Brian Ford).


#set_encoding accepts ("bom|utf-16be:euc-jp") but rejects 
("bom|utf-16be", "euc-jp"). This is inconsistent, confusing, and has 
nothing to do with the artificial mode vs encoding justification above. 
This inconsistency requires additional code that is subject to bugs.

The fact that there were no tests for this until I wrote the RubySpecs 
illustrates the inconsistency, confusion, and susceptibility to ad hoc 
implementation-defined semantics. I still don't see a single test for 
#set_encoding with "bom|" arguments in the MRI tests. Or am I missing 
something?

Cheers,
Brian
----------------------------------------
Bug #7200: Setting external encoding with BOM|
https://bugs.ruby-lang.org/issues/7200#change-36676

Author: brixen (Brian Ford)
Status: Rejected
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) 
[x86_64-darwin10.8.0]


File.open will accept, for example, :encoding => "bom|utf-16be:euc-jp" 
or :encoding => "bom|utf-16be". However, :external_encoding => 
"bom|utf-16be" raises an ArgumentError. Likewise, IO#set_encoding will 
accept "bom|utf-16be:euc-jp" but raises an ArgumentError if passed 
"bom|utf-16be", "euc-jp".

It is inconsistent to accept "bom|utf-*" in some cases and not others.

See the following IRB transcript.

$ irb
1.9.3p286 :001 > f = File.open "foo.txt", "r", :encoding => 
"bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :002 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :003 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :004 > f.close
 => nil
1.9.3p286 :005 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :006 > f.set_encoding "bom|utf-16be:euc-jp"
 => #<File:foo.txt>
1.9.3p286 :007 > f.internal_encoding
 => #<Encoding:EUC-JP>
1.9.3p286 :008 > f.external_encoding
 => #<Encoding:UTF-16BE>
1.9.3p286 :009 > f.close
 => nil
1.9.3p286 :010 > f = File.open "foo.txt", "r"
 => #<File:foo.txt>
1.9.3p286 :011 > f.set_encoding "bom|utf-16be", "euc-jp"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):11:in `set_encoding'
  from (irb):11
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :012 > f = File.open "foo.txt", "w", :external_encoding => 
"bom|utf-16be"
ArgumentError: unknown encoding name - bom|utf-16be
  from (irb):12:in `initialize'
  from (irb):12:in `open'
  from (irb):12
  from /Users/brian/.rvm/rubies/ruby-1.9.3-p286/bin/irb:16:in `<main>'
1.9.3p286 :013 > f = File.open "foo.txt", "rb", :encoding => 
"bom|utf-16be"
 => #<File:foo.txt>

Thanks,
Brian
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.