I’m still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?
How can I tell ruby which encoding to use, if I write to textfiles?
Any pointers to documentation are wellcome, but I didn’t find something
usefull using google.
Achim D. (SyynX Solutions GmbH) wrote:
I’m still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?
What’s an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?
How can I tell ruby which encoding to use, if I write to textfiles?
Any pointers to documentation are wellcome, but I didn’t find
something usefull using google.
Encoding is not an easy issue with ruby - I guess by default it uses the
default enconding of your environment. But you can specify certain
(Japanese) encodings with command line option -K. HTH
Kind regards
At Wed, 30 Nov 2005 00:17:29 +0900,
Robert K. wrote in [ruby-talk:167988]:
I’m still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?
What’s an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?
It would be UTF-8 encoded BOM, but ruby itself never write it
How can I tell ruby which encoding to use, if I write to textfiles?
Can’t you show the code?
[email protected] wrote:
It would be UTF-8 encoded BOM, but ruby itself never write it
Can’t you show the code?
Trying to reproduce the problem in a smaller example, I figured out,
that I’m reading the BOM from one of my source files. Sorry for the
confusion. I’m doing something like:
File.open(“target”,“w”) do |target|
File.open(“source”,“r”) do |source|
source.each_line do |line|
… some processing …
source seems to contain the BOM and it is writen to target. Any hint on
how to strip the BOM?
I’m doing something like:
File.open(“target”,“w”) do |target|
File.open(“source”,“r”) do |source|
source.each_line do |line|
… some processing …
Have you looked at ‘iconv’ in the standard library?
Assuming all your input files were ISO-8859-1, and you wanted your
output file in UTF-8, your example might look something like (untested):
File.open(“target”,“w”) do |target|
Iconv.open(‘UTF-8’, ‘ISO-8859-1’) do | converter |
File.open(“source”,“r”) do |source|
source.each_line do |line|
# … some processing …
target.write( converter.iconv(line) )
target << converter.iconv(nil)
Iconv should deal with BOMs, stripping them out or adding them in where
necessary. I’m not sure if it will complain if it finds a BOM mid-stream
(as you open your second and subsequent input file) - if so you could
just instantiate a new Iconv to deal with each input.