Out of memory (java heap space) on zip creation (jruby)

dubstep · May 9, 2012, 6:53pm

I am using rubyzip and am trying to put a huge csv file with 1.4
million rows into the zip file.
Using jruby I get a out of heap error.

I believe the error happens in the block below:

Zip::ZipOutputStream.open(zip_path) do |zos|
  zos.put_next_entry(File.basename(csv_path))
  zos.print IO.read(csv_path)
end

Jedrin · May 9, 2012, 7:29pm

On Wednesday, May 9, 2012 1:52:27 PM UTC-3, Jedrin wrote:

end

You’re reading the entire file contents into memory and then saving.

Look if there is a way for you to stream chunks (16 kilobytes for
example)
into the zip stream.

–
Luis L.

Jedrin · May 9, 2012, 8:08pm

The error happens on the line:

zos.print IO.read(csv_path)

I see that
p zos.class
shows:
Zip::ZipOutputStream

and that the print method is inherited from:
http://rubyzip.sourceforge.net/classes/IOExtras/AbstractOutputStream.html
where print is shown to be this according to doc:

File lib/zip/ioextras.rb, line 130

def print(*params)
  self << params.to_s << $\.to_s
end

I am not sure offhand how to stream the data, but gathered that the
problem was from reading the
file into memory

Jedrin · May 9, 2012, 8:44pm

On Wed, May 9, 2012 at 2:07 PM, Jedrin [email protected] wrote:

I am not sure offhand how to stream the data, but gathered that the
problem was from reading the
file into memory

The default heapsize for the jvm is pretty small. I believe you can
pass
args to jvm when you start jruby

if you do something like -xmx1024m (Not sure that syntax is exactly
correct, but it’s close) you might get enough. Of course that depends
on
the size of the file

–
Greg A.
http://twitter.com/akinsgre

Jedrin · May 9, 2012, 9:32pm

On May 9, 2:42pm, Greg A. [email protected] wrote:

The default heapsize for the jvm is pretty small. I believe you can pass
args to jvm when you start jruby

if you do something like -xmx1024m (Not sure that syntax is exactly
correct, but it’s close) you might get enough. Of course that depends on
the size of the file

–
Greg A.http://twitter.com/akinsgre

Well, the csv file has something like 1.4 million rows and maybe 20
columns or something like that. When I get a chance, maybe I’ll look
into that if that seems like the thing to try …

Jedrin · May 9, 2012, 10:05pm

Jedrin wrote in post #1060204:

On May 9, 2:42pm, Greg A. [email protected] wrote:

The default heapsize for the jvm is pretty small. I believe you can pass
args to jvm when you start jruby

if you do something like -xmx1024m (Not sure that syntax is exactly
correct, but it’s close) you might get enough. Of course that depends on
the size of the file

–
Greg A.http://twitter.com/akinsgre

Well, the csv file has something like 1.4 million rows and maybe 20
columns or something like that. When I get a chance, maybe I’ll look
into that if that seems like the thing to try …

“When I get a chance, maybe…”???

Greg gave you the answer. A default JVM instance heap space is limited
to 64 Megabytes. If the file you’re loading, plus the memory consumed by
your application, goes over that memory limit the JVM will report “out
of memory” and begin exhibiting unpredictable behavior.

It make no difference how much physical RAM your machine might contain.
The JVM will NOT use more heap space that the maximum defined by the
-xmx argument (-xmx64m being the default when not specified).

Jedrin · May 9, 2012, 10:53pm

On Wed, May 9, 2012 at 4:42 PM, Jedrin [email protected] wrote:

When I tried to download the csv file (which the server puts into the
zip file and then crashes),
I got the same heap space error, but it seemed like it did run longer
before it crashed. II try to increase that number much higher than
1024m, I get:

The heap contains all the objects created for the application… In this
case, it looks like your file is still too big

Error occurred during initialization of VM
Could not reserve enough space for object heap
JVM creation failed

This means that you tried to allocate more than is available on the
machine

Are you doing this for a single load, or will it be an application that
will commonly receive large files?

If it’s the latter, I’d probably try to redesign the code you’re using
to
load the files. Sounds like this is part of a third party gem? If
that’s
the case, maybe they have some mechanism for handling larger files?

–
Greg A.
http://twitter.com/akinsgre

Jedrin · May 9, 2012, 11:23pm

will commonly receive large files?

If it’s the latter, I’d probably try to redesign the code you’re using to
load the files. Sounds like this is part of a third party gem? If that’s
the case, maybe they have some mechanism for handling larger files?

–
Greg A.http://twitter.com/akinsgre

What I do is create a csv file from the database. I had some memory
problems there, but using active record find_in_batches() seemed to
solve that.

The CSV file has 1.4 million rows. It gets created successfully. I
then use rubyzip gem to create a zip file that just contains that CSV
file. I just used examples I found from google searches on how to
create the zip file which are shown earlier up in the thread. I looked
at the class info on the web for rubyzip and didn’t see an obvious way
to stream data into the zip file. Tomorrow I can look at perhaps some
other way to create a zip file using a different gem or some such …

Jedrin · May 9, 2012, 10:43pm

–
Posted viahttp://www.ruby-forum.com/.

So I launched my sinatra app like this and from my google searches the
-J arg looks like what I want.

jruby -J-Xmx1024m -S recordset.rb

When I tried to download the csv file (which the server puts into the
zip file and then crashes),
I got the same heap space error, but it seemed like it did run longer
before it crashed. II try to increase that number much higher than
1024m, I get:

Error occurred during initialization of VM
Could not reserve enough space for object heap
JVM creation failed

Jedrin · May 10, 2012, 2:00am

On Wednesday, May 9, 2012 6:21:39 PM UTC-3, Jedrin wrote:

machine

–
at the class info on the web for rubyzip and didn’t see an obvious way
to stream data into the zip file. Tomorrow I can look at perhaps some
other way to create a zip file using a different gem or some such …

As I mentioned in my previous reply and similar to the problem you had
when
creating the file: you’re trying to load the whole thing.

There are two options for this:

A) You stream the contents of your CSV file, reading by chunks into a
ZipStream

or

B) You zip the file from outside Ruby (shelling out to gzip for example)

–
Luis L.

Jedrin · May 11, 2012, 4:28pm

I changed the putc about to a write in the above post, followed by
zos.print “” at the very end. print() adds $\ to the file it appears.
My byte size of the zip file inside the zip was short by two bytes and
I still get corrupted zip file errors on that.

Jedrin · May 11, 2012, 4:13pm

As I mentioned in my previous reply and similar to the problem you had when
creating the file: you’re trying to load the whole thing.

There are two options for this:

A) You stream the contents of your CSV file, reading by chunks into a
ZipStream

That’s exactly what I would like to do, I wasn’t sure offhand if the
zip method will read it that way or how to pass it. I was hoping for
an idea on how to do that.

The code where it all happens is here and the second line is where it
crashes:

zos.put_next_entry(File.basename(fpath))
zos.print IO.read(fpath)

zos is an instance of Zip::ZipOutputStream.
The print method is inherited from IOExtras::AbstractOutputStream

According to the docs, print() is like this
def print(*params)
self << params.to_s << $.to_s
end

Since it does params.to_s, I’m guessing that is going to put it all
into memory.
The other methods may have similar problems.

However, the putc method looked interesting.

There is a putc() defined like this according to the docs:

def putc(anObject)
self << case anObject
when Fixnum then anObject.chr
when String then anObject
else raise TypeError, “putc: Only Fixnum and String
supported”
end
anObject
end

So I tried that, here is my code, and the output follows, but the file
I was trying to zip was another zip file. It appeared to be a bit
bigger than it should have been and when I tried to open it, I got an
error saying it was corrupted.

This isn’t quite the same CSV problem, but I am doing a zip file into
a zip file here.

def zput(zos,fpath)
p fpath
zos.put_next_entry(File.basename(fpath))
f = File.new(fpath)
chunk_sz = 10000000
while !f.eof?
data = f.read(chunk_sz)
zos.putc data
puts ‘read ’ + data.size.to_s + ’ bytes’
end
end

“web.war”
read 10000000 bytes
read 10000000 bytes
read 8573823 bytes
“data.war”
read 10000000 bytes
read 8655347 bytes
“big.zip”
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 3431079 bytes

Jedrin · May 11, 2012, 10:49pm

It’s late Friday and I am done for the day, but I just tried something
else. It may be that I need to open the file in binary mode and I
didn’t. Initial tests seem to indicate that may be the case. Thanks
for everyone’s help.