I have some big files with lot of “unsigned int” (4 bytes) numbers and I
want to read and write on these files.
Currently, I found this to write:
myfile << [mynum].pack(“i”)
and to read:
mynum = myfile.read(4).unpack(“i”).first
I wonder if there’s not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.
Hi,
----- Original Message -----
From: “Vianney L.” [email protected]
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” [email protected]
Sent: Thursday, October 25, 2007 11:36 PM
Subject: read write integer in binary into a file
mynum = myfile.read(4).unpack(“i”).first
I wonder if there’s not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.
I included Marshal.dump for completeness, but agree that it doesn’t
appear to be meant for this sort of thing. Here’s the source to run
the benchmark:
require ‘benchmark’
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
x.report(’[].pack(i):’) { n.times do; [number].pack(‘i’); end }
x.report(‘pack_int32:’) { n.times do; pack_int32(number); end }
x.report(‘Marshal.dump:’) { n.times do; Marshal.dump(number); end }
end
require ‘benchmark’
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
x.report(‘[].pack(i):’) { n.times do; [number].pack(‘i’); end }
x.report(‘pack_int32:’) { n.times do; pack_int32(number); end }
x.report(‘Marshal.dump:’) { n.times do; Marshal.dump(number); end }
end
Using only the number 2_000_000 seems to skew the results. I see your
results with your test, but if I change it slightly to use a variety
of integers, I get more balanced results:
require ‘benchmark’
MAX = 2**30
n = 1_000_000
nums = (0…n).map{ (rand*MAX).to_i }
I wonder if there’s not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.
Thank you.
Posted viahttp://www.ruby-forum.com/.
Do you have to deal with each number individually? Maybe you could
build up an array of numbers and then pack them all at once:
arr = []
while work_to_do do
mynum = generate_next_number
arr << mynum
end
myfile.write arr.pack(‘i*’)
That way you aren’t creating a new array for each number.
Similarly, for reading the file:
data = file.read
num_array = data.unpack(‘i*’)
The ‘*’ in (un)pack means to process the rest of the data in the same
way.
I have some big files with lot of “unsigned int” (4 bytes) numbers and I
want to read and write on these files.
Currently, I found this to write:
myfile << [mynum].pack(“i”)
and to read:
mynum = myfile.read(4).unpack(“i”).first
I wonder if there’s not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.
Thank you.
irb(main):001:0> f=open(‘test’,‘w’)
=> #<File:test>
irb(main):002:0> f<<[65535].pack(‘i’)
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack(‘i’)
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!
irb(main):001:0> f=open(‘test’,‘w’)
=> #<File:test>
irb(main):002:0> f<<[65535].pack(‘i’)
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack(‘i’)
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!
So clearly the integer 720850 is packed into 4 bytes as requested. Why
does it occupy 5 bytes in the file? But see the “\n” in position 2? That
means that the 3rd byte is a newline character, and on Windows, in text
files, Ruby turns newlines into CRLF. 2 bytes! Since you’ve got binary
data in your file you don’t want to write a text file, so you must open
the file with the “b” flag in addition to “w”:
f = open(“test”, “wb”)
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.