Forum: Ruby "Segmentation fault" on import script.

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
(Guest)
on 2006-02-02 00:52
(Received via mailing list)
Hi gang. Sorry I haven't been able to respond to my last post about
Stop Word dictionaries. Been all too busy, but the posted info was very
useful.

Anyways, I have a bit of a situation. I have a collection of files,
around 10,000, that I need to parse and then suck that data into a
database, along with their linked images. I've had a script that has
been working pretty much for over 99.5% of the articles. Both the
article data, and the images were getting imported fine.

The images also have to go through a few processing steps before being
put into the database. They are resized to meet a certain constraint,
of a new document format, and are also resized again a few more times
to create 2 sizes of Thumbnails. I've been using the RMagick library to
do the image resizing.

I then had to make a change, because I realized that I wasn't
accommodating Animated GIF's and the resulting images that were saved
only contained the first frame. So now I added just a routine around
the resizing to iterate through an ImageList and do all the resizing
for each frame.

Now with this change I get a "[BUG] Segmentation fault" error. I
thought maybe I was starving the memory resources and the GC couldn't
keep up. (I'm uncomfortably unfamiliar with how the Ruby GC works as
opposed to Java or .NET GC), so I explicitly call garbage_collect after
about 100 imports. Still get the same error.

This happens in both Ruby 1.8.2 and 1.8.4 on Mac OSX as well as Linux.
Also, if it possibly means anything, this script is run in a Rails
environment using the runner script. It will run successfully for about
4000 articles before it bombs out.

Here is an excerpt of the possibly offensive code:
# image_object is an ActiveRecord object created
# a little before this code.
# get the ImageList from the image file.
image_file_list = Magick::ImageList.new(@old_site_path + image_path)
# create ImageLists for the thumbnails copied from the original
ImageList
smaller_list = image_file_list.copy
smallest_list = image_file_list.copy
tiniest_list = image_file_list.copy

# loop thorugh the images in the list
for image_index in 0...image_file_list.length
  image_file = image_file_list[image_index]
  # resize the loaded image to the main constraints
  image_file.change_geometry!('150x150') do |cols, rows, img|
    img.resize!(cols, rows)
    image_object.original_x = cols
    image_object.original_y = rows
   end

  smaller_list[image_index] = image_file.change_geometry('110x110') do
|cols, rows, img|
    image_object.thumb_x = cols
    image_object.thumb_y = rows
    img.resize(cols, rows)
  end
  smallest_list[image_index] = image_file.change_geometry('91x91') do
|cols, rows, img|
     image_object.small_thumb_x = cols
     image_object.small_thumb_y = rows
     img.resize(cols, rows)
  end
  tiniest_list[image_index] = image_file.change_geometry('50x50') do
|cols, rows, img|
    image_object.tiny_thumb_x = cols
    image_object.tiny_thumb_y = rows
    img.resize(cols, rows)
  end
end

image_object.original_filename = File.basename(image_path)
image_object.title = "#{ review_data[:artist] }: #{ review_data[:title]
}"

image_object.image_data = image_file_list.to_blob
image_object.tiny_thumb = tiniest_list.to_blob    # <-- segement fault
usually happens here.
image_object.big_thumb = smaller_list.to_blob
image_object.small_thumb = smallest_list.to_blob


Thanks in advance....
Timothy H. (Guest)
on 2006-02-02 00:58
(Received via mailing list)
removed_email_address@domain.invalid wrote:
> The images also have to go through a few processing steps before being
>
>
>
>   smaller_list[image_index] = image_file.change_geometry('110x110') do
>   end
> }"
>
> image_object.image_data = image_file_list.to_blob
> image_object.tiny_thumb = tiniest_list.to_blob    # <-- segement fault
> usually happens here.
> image_object.big_thumb = smaller_list.to_blob
> image_object.small_thumb = smallest_list.to_blob
>
>
> Thanks in advance....
>

Anything that runs for 4000 iterations and then bombs almost has to be
an out-of-memory condition. Depending on the size of your images, 100
images could represent a very sizable chunk of memory. For example, 100
4 megapixel images from a digital camera require >1600MB of memory _if_
you've configured ImageMagick to use 8 bits per pixel. (By default
ImageMagick is configured to use 16 bits per pixel.)

Why don't you try putting a GC.start after the .to_blob calls to free up
those image lists?

Here's a description of memory problems that can arise when you're using
RMagick:
http://rubyforge.org/forum/forum.php?thread_id=137...
(Guest)
on 2006-02-02 14:26
(Received via mailing list)
Hmm... Strange... I did try to run a GC.start after each pass, and I
now get the following errors:
ruby(13904) malloc: *** vm_allocate(size=1048576) failed (error code=3)
ruby(13904) malloc: *** error: can't allocate region
ruby(13904) malloc: *** set a breakpoint in szone_error to debug
../db/importer.rb:158: [BUG] Bus Error

I monitored the memory size and it would fluctuate between 15MB to 90MB
of Physical memory. I see the Physical memory grow and shrink mostly in
the 60-70MB range, but towards the end grew to 90MB. The virtual memory
on the other hand starts off at about 50MB and quickly grows to 200MB
and then slowly grows to 3.51GB before the program crashes. So i can
see that there is some sort of an out of memory issue. Do you believe
that there might be some sort of memory leak for certain operations in
RMagick, particularly ImageLists created with the copy method?

I have built my copy of ImageMagick to use 8bit quantum. Most of the
pictures that are imported are GIFs, and about 15-20% are animated with
probably only 2 or 3 frames. Each GIF is roughly about 8-30KB's in
size. And we are talking about importing nearly 10,000 GIFs. with the
animated frames that's roughly about 13,000 frames. Those also then
each get converted to 3 smaller images, one at 110x110, one at 91x91,
and one at 50x50.

I also modified things a little, and had the change_geometry method
work on the frame inside copied ImageList, since each list is a deep
copy, rather than resizing each off of the original, and then placing
the new frames over the existing ones. This seemed to have the same
results.

I'm wondering if there is a way to copy the ImageList object
properties, such as animation settings and such, without copying over
the actual Images? I looked in the docs, but all the copy methods
seemed to be deep copies.

I was thinking of possibly changing my application so that it would
create the smaller images from the main image the first time it is
called for, and then cache them on the filesystem. But seeing the
issues with RMagick running out of memory even when calling GC.start, I
don't think I can reliably use this method for long running web
applications. Plus I'd really like to offload all the processing at the
beginning, so the webserver will do less work when serving users.

Thanks for your help,

Sean
(Guest)
on 2006-02-02 16:24
(Received via mailing list)
I have a temporary fix for now. I just made a bash script that would
then run the script for groups of sub directories, rather than running
it on the whole tree. So the Ruby script will exit before the memory
ever gets too large. It's a hack, but it'll work.

Sean
This topic is locked and can not be replied to.