Cut first x pages from a PDF file

Hi all,

I’m finishing up on a project, which has a model with an attached PDF
file. I’m using Paperclip to process the attachment.

What I’d like to do is to enable users to cut (remove) the first x
pages from the PDF-file, but I have no idea how to do that. Is there a
gem that can do this? And how would this integrate with Paperclip?

Thank you very much.

Kind regards,
Jaap H.

Take a look at Paperclip Processors
Paperclip could use custom processors to do non-standart stuff with
attachments. You specify your processor name like:

has_attached_file :file,
:styles => {:original => {:processors => [:pdf_processor]}}

and you need to have a class in
lib/paperclip_processors/pdf_processor.rb like

module Paperclip
class PdfProcessor < Paperclip::Processor

end
end

there’s some examples in documentation. The thing is you get an uploaded
@file in your processor, and in the end you need to return your changed
@new_file back. You could do anything with it inside.

Like in your case, you can create some tmp_dir, process the @file.path
with
something like pdftk to your tmp_dir/new_file_name.pdf, and return
@new_file = File.open(tmp_dir/new_file_name.pdf) to Paperclip to store
in
path you specified in the Model. Then you just delete tmp_dir with
after_create filter or with cron job.

Hi Vladimir,

Thanks. The bit about paperclip processors is very helpful!

I’ve looked at pdftk though and couldn’t find how to cut pages from
PDF files. Am I overlooking something? The idea is that we will
receive many different kinds of PDF-files including (for example)
author information that the author doesn’t want to share. If this
information is on the first two pages, we’d like the author to be able
to say “cut the first two pages out and save it”.

Thanks again.

I don’t think that you can explicitly do that with pdftk, but you can do
‘burst’ to break the pdf out into lots of single page documents, and
then
‘cat’ to combine the pages that you want in the final document.

Simon

On Sun, 25 Apr 2010 21:46:23 +0800, jhaagmans [email protected]

Well, you can use ranges with pdftk. Like

$ pdftk in.pdf cat 3-end output out.pdf

Catenates pages from third till the end. Here are good examples of pdftk
usage