Question on reading text files in Windows


#1

I am running Ruby 1.86 on Windows, and having trouble reading in some
text files. For some text files, if I do something simple like:

myfile = File.open(“logfile.log”)
contents = myfile.read()
puts contents

I get each character seperated by a space, such as:

”= = = V e r b o s e l o g g i n g s t a r t e d : 1 / 2 8 / 2
0 0 9
1 3 : 4 5 : 0 6 B u i l d t y p e : S H I P U N I C O D E

If I bring up the file in even a bare-bones editor (such as VIM), I
get the file as it normally is (without any extraneous spaces). Does
anyone know why this would be, or how I can work around it? It’s
causing issues as I am trying to write a script to search for a
particular string of text, and obviously it isn’t found, even though
it should be.

Thanks,

Jim


#2

2009/2/2 Jim K. removed_email_address@domain.invalid:

0 0 9
1 3 : 4 5 : 0 6 B u i l d t y p e : S H I P U N I C O D E

If I bring up the file in even a bare-bones editor (such as VIM), I
get the file as it normally is (without any extraneous spaces). Does
anyone know why this would be, or how I can work around it? It’s
causing issues as I am trying to write a script to search for a
particular string of text, and obviously it isn’t found, even though
it should be.

The file is probably UTF-16 encoded and starts with a BOM.
Try to convert the string to UTF-8, or switch to Ruby 1.9.

Stefan


#3

2009/2/2 Stefan L. removed_email_address@domain.invalid:

”= = = V e r b o s e l o g g i n g s t a r t e d : 1 / 2 8 / 2
The file is probably UTF-16 encoded and starts with a BOM.
Try to convert the string to UTF-8, or switch to Ruby 1.9.

Sorry, I meant to say “Try to convert the string to UTF-8 WITH Iconv”

Stefan


#4

Thanks…so if I upgraded to Ruby 1.9, would it convert it
automatically?


#5

Thanks for the pointer! I actually ended up using the iconv module,
and it worked like a charm. Incidentally, in case anyone else is
curious about this, Windows .REG files get saved as UTF-16 by default.


#6

2009/2/2 Jim K. removed_email_address@domain.invalid:

Thanks…so if I upgraded to Ruby 1.9, would it convert it
automatically?

You’d have to tell it that you want to work with UTF-8
internally by putting this at the top of your application:

Encoding.default_internal = Encoding::UTF_8

and then tell the read or open function that the file
is UTF-16 encoded, e.g.:

content = File.read("logfile.log", encoding: "utf-16")

Though I don’t know how many gems already work for
Ruby 1.9.1 on Windows.

Stefan