Modifying existing PDF files


#1

I’ve been using pdf-writer to generate pdf documents from scratch - it’s
excellent, but I now need to read an existing business card pdf, modify
the content and save the file as a pdf (and if possible, a jpeg to
render to the browser.)

Are there any Ruby extensions that provide this type of functionality?


#2

Not that I know (and I have been looking for the same). You need to use
a
Java or C/C++ PDF reading library. The rumors are that PDFLib will soon
have ruby extensions and in later part of 2006, ruby pdf-reader will be
ready for prime time.


#3

I did some digging on this too and couldn’t find anything. I was
actually searching for just read cability and couldn’t really find
anything. I’m assuming you need to be able to read the PDF before you
modify it, so we have a little ways to go.

But as Roberto said, things are in the works.

-Nick


#4

Unfortunatly my environment is all windows based, no lovely command
line utilities for me :frowning:

And I think the ‘read’ part was more about extracting the PDF text, at
least it was for me, to index and search it. I’ve gotten around it by
using a simple java program that will update the index, where ferret
and rails is just used to read it.


#5

Hi,

you can read PDF documents with Ghostscript.

use the following command to convert a pdf to png, with antialiasing:

gs -q -dBATCH -DNOPAUSE -dFirstPage=1 -dLastPage=1 -r100x100
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 -sDEVICE=png16m
-sOutputFile=output.png input.pdf

If you want to edit this file in a raster format (like png, jpeg), do
it with ImageMagick or the gd libraries for Ruby.

For vector based editing, do a
pdf2ps input.pdf output.ps
and edit the postscript file directly.

regards
Helmut


#6

Would someone be kind enough to give a hint as to how I run a
command-line tool from a web-based RoR app? That’s something which is
new to me.

cmd = “” # I found it helpful to put full path
info for everthing since it wasn’t clear what the current working
directory is…
response = %x(#{cmd})
render :inline => “

#{response}
”, :layout => false

-Greg


#7

Can everyone write an email to support@pdflib, encouraging them to
support ruby bindings? I emailed, and they said at first that there
was insufficient demand. Later, they told me that it was “on [their]
todo list for the next major release.” We all know how todo lists go
when it gets close to deadline :). A little more encouragement would
surely be helpful!

Kyle


#8

Thanks to everyone for their thoughts so far - it’s good to see I’m not
the only one looking for this functionality!

helmut wrote:

you can read PDF documents with Ghostscript.

For vector based editing, do a
pdf2ps input.pdf output.ps
and edit the postscript file directly.

This sounds really promising - in theory I could take a source ps file,
search for my placeholder text in the business card (“name”
“qualifications” etc) and replace it, then pdf the ps file using ps2pdf.
Would someone be kind enough to give a hint as to how I run a
command-line tool from a web-based RoR app? That’s something which is
new to me.

rsaccon wrote:

The rumors are that PDFLib will soon
have ruby extensions and in later part of 2006, ruby pdf-reader will be
ready for prime time.

The ruby pdf-reader sounds really promising. I’d be interested if
anyone has more news about this, and would love to try a beta.

I’ve used PDFlib in conjunction with PHP, but now need a solution that
works with Ruby for both commercial and not-for-profit organisations -
for me justifiying the cost of PDFlib to a social enterprise is pretty
tough! PDF-reader will be welcome when it arrives, assuming no license
fees get attached at that stage!


#9

Kyle,

I brought up this very topic on the list a month or so ago. I even
suggested emailing PDFLib!

Anyways, their response was the same, that they were working on
including
ruby support in the next release.


#10

gedwards1 wrote:

cmd = “” # I found it helpful to put full path
info for everthing since it wasn’t clear what the current working
directory is…
response = %x(#{cmd})
render :inline => “

#{response}
”, :layout => false

Thanks Greg - I’ll try that out.

kyle wrote:

Can everyone write an email to support@pdflib, encouraging them to
support ruby bindings?

Will do Kyle - I think there’s an unofficial ruby extension here:
http://www-ps.kek.jp/thitoshi/ruby/pdflib/ but it doesn’t include access
to the full featureset and is lagging behind a bit now. About time
pdflib supported Ruby.


#11

Nick Ce a écrit :

you can read PDF documents with Ghostscript.

For vector based editing, do a
pdf2ps input.pdf output.ps
and edit the postscript file directly.

Or maybe someone would want to start porting
http://search.cpan.org/~areibens/PDF-API2-0.51/
from perl to ruby…
Would be great.


Jean-Christophe M.


#12

The main thing slowing down PDF support
at this point is that I am very busy between work (doing very
interesting things) and home (doing very interesting things) that
leaves me little time to work on my OSS projects (doing very
interesting things), much less the other opportunities that I have
been presented with (doing very interesting things).

-austin

Austin Z.

It’s great to hear from you Austin, and many thanks for PDF-writer!
Sounds like you’re wonderfully busy - can you give any indication of how
far along the other pdf tools are, what your plans for the package are
and what we can do to help?


#13

On 10/12/05, Jean-Christophe M. removed_email_address@domain.invalid wrote:

Nick Ce a écrit :

you can read PDF documents with Ghostscript.

For vector based editing, do a
pdf2ps input.pdf output.ps
and edit the postscript file directly.
Or maybe someone would want to start porting
http://search.cpan.org/~areibens/PDF-API2-0.51/
from perl to ruby…
Would be great.

Probably not. I’ve always looked at it as one of my possible sources
of inspiration. The API itself is just shy of disastrous.
Understanding what the library actually does is very difficult, even
for someone who really does understand PDF (which I think I can safely
say that I do at this point). That said, I will continue to look at
what a wide variety of other libraries do to make sure that the
Ruby-PDF tools are complete. The main thing slowing down PDF support
at this point is that I am very busy between work (doing very
interesting things) and home (doing very interesting things) that
leaves me little time to work on my OSS projects (doing very
interesting things), much less the other opportunities that I have
been presented with (doing very interesting things).

-austin

Austin Z. * removed_email_address@domain.invalid
* Alternate: removed_email_address@domain.invalid


#14

Austin Z. wrote:

The original plan was to have 1.1.4 out for Hallowe’en and 1.2.0 out
for Christmas. 1.1.4 will be out for Christmas or New Year’s, and
1.2.0 I hope will be out shortly after that. I will be working on
PDF::Core in parallel, which is the core piece that is necessary for
any other part. The hope will be to be able to have a working version
of PDF::Writer running on top of PDF::Core around May while I have
several other versions of PDF::Writer released with increasing
features (SVG import and presentations support) and then convert it
all to use the PDF::Core support around June or July. Once PDF::Core
is done, I can also start working on PDF::Reader, so I might be able
to have an early access version of that around the same time.

Thanks for the roadmap Austin - it certainly sounds very promising
indeed, and I look forward to seeing how it evolves. I hope you can find
some willing help through this list to take some of the pressure off.

kyle wrote:

Can everyone write an email to support@pdflib, encouraging them to
support ruby bindings? I emailed, and they said at first that there
was insufficient demand. Later, they told me that it was “on [their]
todo list for the next major release.” We all know how todo lists go
when it gets close to deadline :). A little more encouragement would
surely be helpful!

I’ve just heard from PDFlib - they inform me that Ruby support will be
included in the next maintenance release of PDFlib during the first
quarter of 2006. Sounds promising…


#15

On 12/12/05, Nick Ce removed_email_address@domain.invalid wrote:

The main thing slowing down PDF support
at this point is that I am very busy between work (doing very
interesting things) and home (doing very interesting things) that
leaves me little time to work on my OSS projects (doing very
interesting things), much less the other opportunities that I have
been presented with (doing very interesting things).
It’s great to hear from you Austin, and many thanks for PDF-writer!
Sounds like you’re wonderfully busy - can you give any indication of how
far along the other pdf tools are, what your plans for the package are
and what we can do to help?

Well, as of right at this moment, I’ve had about 12 hours or so to
work on any of the PDF related stuff since RubyConf.

So not much has changed since October. I have applied a number of
patches and bugfixes and have mostly applied the Japanese language
patch provided in early November, but the author of that patch will be
reapplying portions that I have messed up as soon as I can get the
package ready for him (I have changed some of the way that the bits
are laid out because it’s very much a special case situation that I’m
not particularly happy with, but want to have because what it provides
is so very valuable).

The original plan was to have 1.1.4 out for Hallowe’en and 1.2.0 out
for Christmas. 1.1.4 will be out for Christmas or New Year’s, and
1.2.0 I hope will be out shortly after that. I will be working on
PDF::Core in parallel, which is the core piece that is necessary for
any other part. The hope will be to be able to have a working version
of PDF::Writer running on top of PDF::Core around May while I have
several other versions of PDF::Writer released with increasing
features (SVG import and presentations support) and then convert it
all to use the PDF::Core support around June or July. Once PDF::Core
is done, I can also start working on PDF::Reader, so I might be able
to have an early access version of that around the same time.

The main impediment that will prevent others from being able to
provide assistance is PDF::Core. I have to implement this right and
I also need to make sure that it’s not significantly less efficient
than what I’m doing in PDF::Writer. (PDF::Writer’s implementation of
PDF objects is heavily optimized toward output and that’s what makes
it unsuitable for PDF::Reader.) After that, I can easily accept help
on the core implementation, but it requires an understanding of a
pretty hefty specification.

Where I can accept help now is in beefing up what PDF::Writer can do.
There are a number of things that can easily be built on top of
PDF::Writer – graphs, math support, more work on the SVG support
(which isn’t in CVS right now, and I need to look at what my
collaborators have provided to date, but …), etc. Heck; the SVG
support needs a CSS engine to properly handle CSS in SVG. I took a
stab at porting the CSS engine from TurboGears back in August, but
that was a slightly early version and there were issues. I think I
still have a partial copy somewhere on one of my computers that if
some enterprising soul wants to pick that up and run with it (it will
have to meet my ideas of API quality because I am the first guaranteed
user of it ;), I’ll share that. Or if someone wants to know what I
need on that, contact me at this email address (not via the list,
please) that’ll help.

I don’t expect to be able to accept core code help until around March.
Or later, if my schedule is as bad as it has been for the last two
months.

-austin

Austin Z. * removed_email_address@domain.invalid
* Alternate: removed_email_address@domain.invalid