Hi,
Let’s say I upload a pdf file. Imagemagick extracts all pages out of it
and stores the png images on the hard-drive. How to easily handle all
these generated files with Paperclip?
Has anyone done that before? Thanks for your advice
Hi,
Let’s say I upload a pdf file. Imagemagick extracts all pages out of it
and stores the png images on the hard-drive. How to easily handle all
these generated files with Paperclip?
Has anyone done that before? Thanks for your advice
Fernando P. wrote:
Hi,
Let’s say I upload a pdf file. Imagemagick extracts all pages out of it
and stores the png images on the hard-drive. How to easily handle all
these generated files with Paperclip?Has anyone done that before? Thanks for your advice
I’ve done precisely this just recently. It isn’t as tricky as it seems,
really. All you need are a few steps in your pdf processor that will
take the extracted images and add them to a new record. So, if you have
the following relationship:
class Document < ActiveRecord::Base
has_many :images
has_attached_file :file, :styles => { :original => {} }, :processors
=> [:extract]
end
class Image < ActiveRecord::Base
belongs_to :document
has_attached_file :image
end
In your processor perform your extraction to a temporary folder, and
after it is done do something like the following:
if @attachment.respond_to?(:instance) and
@attachment.instance.respond_to?(:images)
@attachment.instance.images.destroy_all
Dir.glob("#{@temporary}/*.{jpg,png}").each do |path|
File.open(path) { |file| @attachment.instance.images.create(:image
=> file) }
end
else
raise PaperclipError, “Unable to save extracted pages. No valid
attachment.”
end
Afterwards make sure to remove the temporary folder and you should be
good.
Parker S. wrote:
Interesting approach. In particular problem you ran into in practice?
Too many files for the fs? Database blowing up? Other?
Fernando P. wrote:
Interesting approach. In particular problem you ran into in practice?
Too many files for the fs? Database blowing up? Other?
It has worked really well in practice. The failing point was always
ImageMagick, really. We ended up using pdf2image instead, which yielded
much better output, much faster. We’ve processed 120+ page documents, so
the file issue wasn’t a problem. With the time it takes to process the
images (assuming you are resizing / thumbnailing) you’ll certainly want
to process with a background processor though–Delayed Job, Resque or
the like.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs