Exporting CM pages to PDF

Has anyone found a good solution for exporting pages to PDF and keeping the layout as close as possible? We have a newsletter that we have to manually recreate in PDF format and are trying to automate the process. (http://www.niehs.nih.gov/news/newsletter/2012/12/) We are looking at writing to XML and using a post edition task to use Apache FOP to create the PDF file.

Has anyone implemented a PDF export solution?

Are there any good alternatives to FOP for creating the PDF?

mcolebank,

Haven’t implemented an automated PDF generation solution, but I think you’re on the right track with creating a post edition task to generate the PDFs.

Apache FOP seems to be the best solution. You may be interested in the HTML to FO plugin http://html2fo.sourceforge.net/.

The question that I have is: do you want to generate the file by the publisher or would a script that generates the pdf on the fly on the delivery tier (as per request) be sufficient? We use TCPDF on the delivery side to generate pdfs as people request them (as for us, creating the pdfs just for the sake of having them didn’t make sense given how much content we have and how much time it would take the publisher to publish pdfs for all the sites…).

Jit,

I like your solution since generating and storing PDFs can lead to additional space requirements on the web servers. Do you proxy PDF generation requests to a separate server to reduce load on your web server?

Nope, we do not proxy that to another server. As far as I can tell, there are not a lot of requests to generate pdfs (ie. the delivery server handles the load just fine). However, we are “customizing” the pdf so it isn’t an exact replica of the page (as per op question)

Thanks for the information on TCPDF.

We would generate the PDF by publisher. It is a monthly newsletter publish that we are trying to automate rather than creating the PDF manually to mimic the web layout. So it would be specific content that we could put into a template. We would automatically publish the PDF file (which is a compilation of a few dozen web pages of content… basically publishing the whole newsletter in one neat file download). http://www.niehs.nih.gov/news/newsletter/2012/12/file189392.pdf

How do you replicate the web page content to PDF? Do you run TCPDF as a post edition task to generate the PDF’s from published text or xml files?

I use velocity to create a PHP file that utilizes the mpdf45 library. You will need to have PHP running on your web server. When the client accesses the file, it returns a pdf version of the content. It comes with many examples to get you started. We use it for printable articles, we publish 2 versions of each article via the same contentlist so you don’t need to create and maintain 2 pieces of content.