Generating a PDF using popen and wkhtmltopdf

nrimbeau · February 2, 2010, 5:14am

As described on wkhtmltopdf Google group
(Google Code Archive - Long-term storage for Google Code Project Hosting.), I have a
problem generating a PDF while using popen and wkhtmltopdf.

wkhtmltopdf takes HTML code as input and ouputs a PDF file. Here is what
I’m doing:

command = '"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe" - - -q'
IO.popen(command, 'r+') do |f|

  # Writing the html previously rendered in a string
  f.write(html_output)
  f.close_write

  # Reading the ouput and closing
  pdf = f.readlines
  f.close

  # Returning the pdf data
  pdf

end

This code results in a corrupted PDF file. I checked the PDF itself
which shows some differences with a valid PDF file, like some missing
closing tags (endstream) - but I’m not an expert of that format.

Well, my question is the following: am I doing it wrong, using a wrong
method, missing something, or wkhtmltopdf is more likely to be the
problem?

I attached the corrupted file.

If you have a look at it, you’ll notice that a PDF EOF symbol is there,
which tends to say that the generation was not interrupted in any way.

Any idea?

Thanks for your help!

Nicolas

nrimbeau · February 2, 2010, 8:46pm

Hi Nicolas,

Whenever I generate pdfs from a rails app using wkhtmltopdf (or
princexml), I usually call wkhtmltopdf using an app_url (ie
wkhtmltopdf hits the web app to get the html/css/imgs/… to be used
to gen the pdf), something like the following:

in some controller …

require ‘timeout’
…

TIMEOUT_SECS = 5
…

def gen_pdf
url_to_pdf = … # the url to gen the pdf from.
fname = … # the name of the resulting pdf.
ftype = “application/pdf”
# combat shell injection?
app_url = app_url.to_s.gsub(/["’\s$;><&\|\\\[\]]/, '') s = nil # valid url? unless (app_url =~ URI::regexp).nil? begin timeout(TIMEOUT_SECS) do # gen pdf from url. s =wkhtmltopdf -q “#{app_url}” -`.chomp
end
rescue Exception => e
… # log, render/redirect err msg, …
end
end
# invalid pdf?
if not s.to_s =~ /^%PDF/
… # log, render/redirect err msg, …
end

send_data(s, :type=>ftype, :filename=>fname); return

end
…

Jeff

nrimbeau · February 2, 2010, 9:09pm

(reposted due to typo… )

Hi Nicolas,

Whenever I generate pdfs from a rails app using wkhtmltopdf (or
princexml), I usually call wkhtmltopdf using an app_url (ie
wkhtmltopdf hits the web app to get the html/css/imgs/… to be used
to gen the pdf), something like the following:

in some controller …

require ‘timeout’
…
TIMEOUT_SECS = 5
…
def gen_pdf
app_url = … # the url to gen the pdf from.
fname = … # the name of the resulting pdf.
ftype = “application/pdf”
# combat shell injection?
app_url = app_url.to_s.gsub(/["’\s$;><&\|\\\[\]]/, '') s = nil # valid url? unless (app_url =~ URI::regexp).nil? begin timeout(TIMEOUT_SECS) do # gen pdf from url. s =wkhtmltopdf -q “#{app_url}” -`.chomp
end
rescue Exception => e
… # log, render/redirect err msg, …
end
end
# invalid pdf?
if not s.to_s =~ /^%PDF/
… # log, render/redirect err msg, …
end
send_data(s, :type=>ftype, :filename=>fname); return
end
…

Jeff

nrimbeau · February 3, 2010, 2:42am

Thanks for your answer Jeff. I’ll give it a try in my own app and see
whether it’s working or not. I’ll keep you posted!

Cheers,

Nicolas