A Python Solution for Making Custom PDFs from HTML

Recently updated on Feb. 7, 2017

Noel
Software EngineerDec. 22, 2015

The client wanted to give their users the option of printing completed forms to a pdf file. They also wanted the pdf to be rendered with formatting and style that varied slightly from the online display of the completed form, and so desired a solution other than the browser’s own print function. The open-source Reportlab library is a popular solution for generating on-the-fly pdfs, and the xhtml2pdf library, which depends on Reportlab, offers a relatively easy way to convert an html web page to pdf while (more-or-less) preserving css styles. For these reasons, we chose an approach using xhtml2pdf and ReportLab for our client’s request.

I added “xhtml2pdf==0.0.6” to “requirements.txt” to mark it as a dependency of the project.

Installing this dependency into the project (e.g., by running “pip install -r requirements.txt”) will pull in further dependencies. This allows the following lines to be added to "views.py":

import cStringIO as StringIO
from xhtml2pdf import pisa
from django.template.loader import get_template
from django.http import HttpResponse

We used class-based views to render each of the forms we wanted to reproduce as a pdf. The heart of the conversion is found in the class’s “render_to_response” method, the core of which I took fromhttp://stackoverflow.com/questions/1377446/render-html-to-pdf-in-django-site:

class ABC(TemplateView):
    template_name = "mytemplate.html"
    ...
    ...
    def render_to_response(self, context, **response_kwargs):
        template = get_template(self.template_name)
        html = template.render(context)
        result = StringIO.StringIO()
        pdf = pisa.pisaDocument(
            StringIO.StringIO(html.encode("ISO-8859-1")), 
            dest=result, link_callback=fetch_resources)
        if not pdf.err:
            return HttpResponse(result.getvalue(), content_type='application/pdf')
        return HttpResponse("Error: <pre>%s</pre>" % escape(html))

I also took the skeleton of the template (“self.template_name”, above) from that same webpage.

Regarding the template:

This is a basic Django template which defines – as all Django templates do – how the page will be rendered in HTML markup. However, this template defines margins and a footer with a page number that are meant to look nice when the HTML is converted to a pdf.

Some internet users had issues getting the xhtml2pdf library to faithfully render css styles from an external stylesheet, but one sure-fire solution that also suited our very basic styling needs was to put the style rules directly into style tags in the template file itself.

Even so, the library did not seem to respect all style rules equally. Appearances in the browser were not necessarily carried over to the pdf, so some trial and error was necessary to get the pdf looking the way we wanted.

Incorporating graphics, such as the company logo, was accomplished by defining the following function in views.py:

def fetch_resources(uri, rel):
    path = join(settings.STATIC_ROOT, uri.replace(settings.STATIC_URL, ""))
    return path

This function is passed to the pdf generator as the “link callback” (referenced in the “render_to_response” method of our class-based view above). The link callback changes relative paths to content stored in the project’s “static” directory into absolute paths, which are required by the pdf generator.

With content, styles, and graphics in place, our pdf was complete. The “render_to_response” method (above) returns a content type of “application/pdf”, and from there the user’s browser will deliver a downloadable pdf file.

Tagged django, reportlab, pdf