Fred van Dijk: Generating a Word document from collaboratively edited structured content in your Plone Website

published Oct 23, 2019

Talk by Fred van Dijk at the Plone Conference 2019 in Ferrara.

At Zest in The Netherlands we have several clients in Flanders, a part of Belgium. One is a government organisation for the environment. One of their sites is about water management. Information about water levels in separate areas, floodings, pollution, etcetera. The country has several partially overlapping governments, and water is flowing through all of them. So it is a bit complex.

For a new part of the site, they said: wouldn't it be nice to let Plone help in getting the next big report with plans for the next six years for the government together? So the report should be viewable as web pages. But wouldn't it be nice to also let this generate the official Word document?

A business analysis followed, with requirements and scope:

  • collectively edit it
  • seperate between core and background information,
  • export only the core going to a main PDF document

Ah, PDF export. Don't we love it? Always tricky, always corner cases, always too many pages, locking up your Zope server or crashing the export. For another site also dynamic graphics, table of contents. So: alarm bells went off. And do we really need PDF, can't it be something else?

Begin with the end in mind. Do I even want this project? Do we have the required skill sets in the team?

So what did we do? We made an MVP, a Minimum Viable Product. A prototype. Can we do all the parts of the process and have a basic document? Let the client test early to see we are on the same page.

Our internal customer was used to writing lots of plans, but a business plan was something else. Lots of things in there that would be nice. But which of those are really absolutely necessary?

Actually, the customer already made a mockup in a website. We had a meeting, things changed, new plans.

In the end, the Word document did not seem that much of a problem. But our initial main structure with two content types was too rigid. We simplified to one content type, and it was fine. Some different views for the content type to choose from.

We use:

  • plone.api to find content.
  • beautifulsoup4 to analyze the html.
  • python-docx for generating a Word document.

Interesting extra problems were internal links, and later footnote support. Not really supported in python-docx yet, but with some lower level code it worked.

Risk: some html things may be too difficult to put in Word reliably. But they can edit the document later.

Something extra: we use pas.plugins.ldap and it is working smoothly now. We have some code to migrate from to this, and want to publish that.

Remark from Andreas: with a similar project we discovered that it was difficult for editors to make a section from say level 1.2 to 3.2 in the default Plone UI. So we made a special view for that. And watch out: tricky to lock everything down in TinyMCE.

Question: did you try collective.documentgenerator?

Answer: no. The things we found, were too big and generic.