Forum: Ruby Question on reading text files in Windows

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Jim K. (Guest)
on 2009-02-02 19:40
(Received via mailing list)
I am running Ruby 1.86 on Windows, and having trouble reading in some
text files.  For some text files, if I do something simple like:

myfile = File.open("logfile.log")
contents = myfile.read()
puts contents

I get each character seperated by a space, such as:

”= = =   V e r b o s e   l o g g i n g   s t a r t e d :   1 / 2 8 / 2
0 0 9
 1 3 : 4 5 : 0 6     B u i l d   t y p e :   S H I P   U N I C O D E

If I bring up the file in even a bare-bones editor (such as VIM), I
get the file as it normally is (without any extraneous spaces).  Does
anyone know why this would be, or how I can work around it?  It's
causing issues as I am trying to write a script to search for a
particular string of text, and obviously it isn't found, even though
it should be.

Thanks,

Jim
Stefan L. (Guest)
on 2009-02-02 21:09
(Received via mailing list)
2009/2/2 Jim K. <removed_email_address@domain.invalid>:
> 0 0 9
>  1 3 : 4 5 : 0 6     B u i l d   t y p e :   S H I P   U N I C O D E
>
> If I bring up the file in even a bare-bones editor (such as VIM), I
> get the file as it normally is (without any extraneous spaces).  Does
> anyone know why this would be, or how I can work around it?  It's
> causing issues as I am trying to write a script to search for a
> particular string of text, and obviously it isn't found, even though
> it should be.

The file is probably UTF-16 encoded and starts with a BOM.
Try to convert the string to UTF-8, or switch to Ruby 1.9.

Stefan
Stefan L. (Guest)
on 2009-02-02 21:26
(Received via mailing list)
2009/2/2 Stefan L. <removed_email_address@domain.invalid>:
>> ”= = =   V e r b o s e   l o g g i n g   s t a r t e d :   1 / 2 8 / 2
> The file is probably UTF-16 encoded and starts with a BOM.
> Try to convert the string to UTF-8, or switch to Ruby 1.9.

Sorry, I meant to say "Try to convert the string to UTF-8 WITH Iconv"

Stefan
Jim K. (Guest)
on 2009-02-02 23:12
(Received via mailing list)
Thanks...so if I upgraded to Ruby 1.9, would it convert it
automatically?
Jim K. (Guest)
on 2009-02-02 23:45
(Received via mailing list)
Thanks for the pointer!  I actually ended up using the iconv module,
and it worked like a charm.  Incidentally, in case anyone else is
curious about this, Windows .REG files get saved as UTF-16 by default.
Stefan L. (Guest)
on 2009-02-02 23:54
(Received via mailing list)
2009/2/2 Jim K. <removed_email_address@domain.invalid>:
> Thanks...so if I upgraded to Ruby 1.9, would it convert it
> automatically?

You'd have to tell it that you want to work with UTF-8
internally by putting this at the top of your application:

    Encoding.default_internal = Encoding::UTF_8

and then tell the read or open function that the file
is UTF-16 encoded, e.g.:

    content = File.read("logfile.log", encoding: "utf-16")

Though I don't know how many gems already work for
Ruby 1.9.1 on Windows.

Stefan
This topic is locked and can not be replied to.