Forum: Ruby File.new and encoding

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
4a51c1e5e3b4827c165a4ac9f4512740?d=identicon&s=25 achim.domma (Guest)
on 2005-11-29 16:09
(Received via mailing list)
Hi,

I'm still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?

How can I tell ruby which encoding to use, if I write to textfiles?

Any pointers to documentation are wellcome, but I didn't find something
usefull using google.

regards,
Achim
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 bob.news (Guest)
on 2005-11-29 16:21
(Received via mailing list)
Achim Domma (SyynX Solutions GmbH) wrote:
> Hi,
>
> I'm still quite new to ruby, but have written a simple code generator.
> The generator opens some files and combines them to a new one. The
> resulting file is encoded as iso-8859-1, but it looks like ruby writes
> an UTF-8 Markter to the beginning of the file. Is that possible?

What's an UTF-8 marker?  I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8.  Did I miss something?

> How can I tell ruby which encoding to use, if I write to textfiles?
>
> Any pointers to documentation are wellcome, but I didn't find
> something usefull using google.

Encoding is not an easy issue with ruby - I guess by default it uses the
default enconding of your environment.  But you can specify certain
(Japanese) encodings with command line option -K.  HTH

Kind regards

    robert
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 nobu (Guest)
on 2005-11-29 16:37
(Received via mailing list)
Hi,

At Wed, 30 Nov 2005 00:17:29 +0900,
Robert Klemme wrote in [ruby-talk:167988]:
> > I'm still quite new to ruby, but have written a simple code generator.
> > The generator opens some files and combines them to a new one. The
> > resulting file is encoded as iso-8859-1, but it looks like ruby writes
> > an UTF-8 Markter to the beginning of the file. Is that possible?
>
> What's an UTF-8 marker?  I know only two byte UTF-16 marker but AFAIK
> there is no marker for UTF-8.  Did I miss something?

It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.

> > How can I tell ruby which encoding to use, if I write to textfiles?

Can't you show the code?
4a51c1e5e3b4827c165a4ac9f4512740?d=identicon&s=25 achim.domma (Guest)
on 2005-11-29 19:56
(Received via mailing list)
nobu@ruby-lang.org wrote:

> It would be UTF-8 encoded BOM, but ruby itself never write it
> automatically.
[...]
> Can't you show the code?

Trying to reproduce the problem in a smaller example, I figured out,
that I'm reading the BOM from one of my source files. Sorry for the
confusion. I'm doing something like:

File.open("target","w") do |target|
     File.open("source","r") do |source|
         source.each_line do |line|
             ... some processing ...
             target.write(line)
         end
      end
end


source seems to contain the BOM and it is writen to target. Any hint on
how to strip the BOM?

regards,
Achim
669b7046f02e5dfc4bda4421f1069731?d=identicon&s=25 alex (Guest)
on 2005-11-29 20:36
(Received via mailing list)
> I'm doing something like:
>
> File.open("target","w") do |target|
>     File.open("source","r") do |source|
>         source.each_line do |line|
>             ... some processing ...
>             target.write(line)
>         end
>      end
> end

Have you looked at 'iconv' in the standard library?

http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/c...

Assuming all your input files were ISO-8859-1, and you wanted your
output file in UTF-8, your example might look something like (untested):

File.open("target","w") do |target|
  Iconv.open('UTF-8', 'ISO-8859-1') do | converter |
    File.open("source","r") do |source|
      source.each_line do |line|
        # ... some processing ...
        target.write( converter.iconv(line) )
      end
    end
    target << converter.iconv(nil)
  end
end

Iconv should deal with BOMs, stripping them out or adding them in where
necessary. I'm not sure if it will complain if it finds a BOM mid-stream
(as you open your second and subsequent input file) - if so you could
just instantiate a new Iconv to deal with each input.

HTH
alex
This topic is locked and can not be replied to.