Windows 2008 Server: Reading Text File with Ruby

luislavena · April 6, 2011, 9:30pm

Hello -
New R. user here.
I was wondering if you could point me in the right direction:

I save the output of a WMIC query to a temp text file on Windows 2008
using Ruby:

system(“wmic MEMORYCHIP get CAPACITY /VALUE > tmp”)

Next, I try to read this file back in order to extract a value from
the output:

contents = File.open(‘tmp’, ‘r:’) { |f| f.read }

However, when I run this in irb, the file is imported with the following
leading characters: \x00 and others. If I open the actual tmp file in
Windows, it displays the correct text.

Is there a way to just import that text only? Is this something to do
with encoding?

Thank you for your help.

Here is the entire session in irb:

irb(main):002:0> system(“wmic MEMORYCHIP get CAPACITY /VALUE >
tmp”)
=> true
irb(main):003:0> contents = File.open(‘tmp’, ‘r:’) { |f| f.read }
=>
“\xFF\xFE\n\x00\n\x00N\x00o\x00d\x00e\x00,\x00C\x00a\x00p\x00a\x00c\x00i\x00t
\x00y\x00\n\x00\n\x00P\x00S\x00,\x008\x005\x008\x009\x009\x003\x004\x005\x009\x0
02\x00”
irb(main):004:0>

russiangeek · April 6, 2011, 9:34pm

On Thu, 7 Apr 2011 04:30:13 +0900, Angelo NN wrote:

the output:
Thank you for your help.
02\x00"
irb(main):004:0>

–
Posted via http://www.ruby-forum.com/.

As it starts with a UTF-16 LE Byte order marker, that’s a pretty good
clue as to the encoding.

-jh

russiangeek · April 6, 2011, 9:35pm

Jonathan H. wrote in post #991289:

On Thu, 7 Apr 2011 04:30:13 +0900, Angelo NN wrote:

the output:
Thank you for your help.
02\x00"
irb(main):004:0>

–
Posted via http://www.ruby-forum.com/.

As it starts with a UTF-16 LE Byte order marker, that’s a pretty good
clue as to the encoding.

-jh

Thank you.

Can you suggest where I can read/etc. about how to change the encoding
for the imported text? (e.g. contents.encode(“UTF-16 LE”) ? )

russiangeek · April 6, 2011, 9:58pm

Jonathan H. wrote in post #991291:

On Thu, 7 Apr 2011 04:35:44 +0900, Angelo NN wrote:

for the imported file?
I’m not at all familiar with dealing with encodings on Windows, but
assuming you’re using a 1.9x ruby,

contents = File.open(‘tmp’, ‘r:utf-16’) { |f| f.read }

or perhaps

contents = File.open(‘tmp’, ‘r:utf-16le’) { |f| f.read }

Given the BOM, I’d hope that the former might work.

-jh

Thanks - I tried utf-16. Unfortunately it gives a “Unsupported encoding
utf-16 ignored” message. Maybe it’s time to switch to another Operating
System for me

contents = File.open(‘tmp’, ‘r:utf-16’) { |f| f.read }
(irb):15: warning: Unsupported encoding utf-16 ignored
=>
“\xFF\xFE\n\x00\n\x00\n\x00\n\x00C\x00a\x00p\x00a\x00c\x00i\x00t\x00y\x00=\x0
08\x005\x008\x009\x009\x003\x004\x005\x009\x002\x00\n\x00\n\x00\n\x00\n\x00\n\x0
0\n\x00”

russiangeek · April 6, 2011, 10:13pm

On Thu, 7 Apr 2011 04:58:54 +0900, Angelo NN wrote:

or perhaps

contents = File.open(‘tmp’, ‘r:utf-16’) { |f| f.read }
(irb):15: warning: Unsupported encoding utf-16 ignored
=>
“\xFF\xFE\n\x00\n\x00\n\x00\n\x00C\x00a\x00p\x00a\x00c\x00i\x00t\x00y\x00=\x0
08\x005\x008\x009\x009\x003\x004\x005\x009\x002\x00\n\x00\n\x00\n\x00\n\x00\n\x0
0\n\x00”

I think the old ways still work:

require ‘iconv’
content=File.binread(‘tmp’)

TO FROM (set TO to ‘native encoding’)

text = Iconv::conv(“utf-8”,‘utf-16’, content)
puts text

-jh

russiangeek · April 6, 2011, 9:48pm

On Thu, 7 Apr 2011 04:35:44 +0900, Angelo NN wrote:

for the imported file?
I’m not at all familiar with dealing with encodings on Windows, but
assuming you’re using a 1.9x ruby,

contents = File.open(‘tmp’, ‘r:utf-16’) { |f| f.read }

or perhaps

contents = File.open(‘tmp’, ‘r:utf-16le’) { |f| f.read }

Given the BOM, I’d hope that the former might work.

-jh

russiangeek · April 6, 2011, 10:45pm

Le 6 avril 2011

russiangeek · April 6, 2011, 10:15pm

Jonathan H. wrote in post #991297:

On Thu, 7 Apr 2011 04:58:54 +0900, Angelo NN wrote:

or perhaps

contents = File.open(‘tmp’, ‘r:utf-16’) { |f| f.read }
(irb):15: warning: Unsupported encoding utf-16 ignored
=>
"\xFF\xFE\n\x00\n\x00\n\x00\n\x00C\x00a\x00p\x00a\x00c\x00i\x00t\x00y\x00=\x0

08\x005\x008\x009\x009\x003\x004\x005\x009\x002\x00\n\x00\n\x00\n\x00\n\x00\n\x0

0\n\x00"

I think the old ways still work:

require ‘iconv’
content=File.binread(‘tmp’)

TO FROM (set TO to ‘native encoding’)

text = Iconv::conv(“utf-8”,‘utf-16’, content)
puts text

-jh

Wow - Awesome.
That worked.

Thanks Jonathan!