Weblog

published Nov 03, 2021 , last modified Nov 04, 2021

Fred van Dijk: Generating a Word document from collaboratively edited structured content in your Plone Website

published Oct 23, 2019

Talk by Fred van Dijk at the Plone Conference 2019 in Ferrara.

At Zest in The Netherlands we have several clients in Flanders, a part of Belgium. One is a government organisation for the environment. One of their sites is about water management. Information about water levels in separate areas, floodings, pollution, etcetera. The country has several partially overlapping governments, and water is flowing through all of them. So it is a bit complex.

For a new part of the site, they said: wouldn't it be nice to let Plone help in getting the next big report with plans for the next six years for the government together? So the report should be viewable as web pages. But wouldn't it be nice to also let this generate the official Word document?

A business analysis followed, with requirements and scope:

  • collectively edit it
  • seperate between core and background information,
  • export only the core going to a main PDF document

Ah, PDF export. Don't we love it? Always tricky, always corner cases, always too many pages, locking up your Zope server or crashing the export. For another site also dynamic graphics, table of contents. So: alarm bells went off. And do we really need PDF, can't it be something else?

Begin with the end in mind. Do I even want this project? Do we have the required skill sets in the team?

So what did we do? We made an MVP, a Minimum Viable Product. A prototype. Can we do all the parts of the process and have a basic document? Let the client test early to see we are on the same page.

Our internal customer was used to writing lots of plans, but a business plan was something else. Lots of things in there that would be nice. But which of those are really absolutely necessary?

Actually, the customer already made a mockup in a website. We had a meeting, things changed, new plans.

In the end, the Word document did not seem that much of a problem. But our initial main structure with two content types was too rigid. We simplified to one content type, and it was fine. Some different views for the content type to choose from.

We use:

  • plone.api to find content.
  • beautifulsoup4 to analyze the html.
  • python-docx for generating a Word document.

Interesting extra problems were internal links, and later footnote support. Not really supported in python-docx yet, but with some lower level code it worked.

Risk: some html things may be too difficult to put in Word reliably. But they can edit the document later.

Something extra: we use pas.plugins.ldap and it is working smoothly now. We have some code to migrate from plone.app.ldap to this, and want to publish that.

Remark from Andreas: with a similar project we discovered that it was difficult for editors to make a section from say level 1.2 to 3.2 in the default Plone UI. So we made a special view for that. And watch out: tricky to lock everything down in TinyMCE.

Question: did you try collective.documentgenerator?

Answer: no. The things we found, were too big and generic.

Rob Gietema: How to create your own Volto site!

published Oct 23, 2019 , last modified Oct 24, 2019

Talk by Rob Gietema at the Plone Conference 2019 in Ferrara.

We will go through the Volto training in about forty minutes.

To create a new Volto project, you need nvm and yarn, and then type the following:

$ npx @plone/create-volto-app my-volto-app
  • You can override public resources like the logo, styling, etc, from Volto.
  • Volto uses Semantic UI. For example, it uses the BreadCrumb component.
  • Theming: in theme.config you use the Pastanaga theme by default, but can pick and choose other themes for some elements. You can do a lot of theming just by changing some variables in less. less instead of sass is needed by Semantic UI for conditional imports.
  • You can override a component by copying the folder structure of that component from Volto to a customizations folder.
  • We use yarn i18n to extract all translatable strings into .po and then .json files.
  • Volto uses the DraftJS rich text editor. You can configure the buttons that are shown.
  • As is usual in React, Volto uses actions and reducers based on Redux.

If you want to learn about Volto, see the full Volto training.

Question: why don't you use React hooks?

Answer: This was introduced one and a half year ago in React, allowing to use more functions instead of classes. Quite cool. But then we would have two ways to do the same thing, and we try not to do that in Volto. We try not to jump on those hooks right away. If we switch after a while, we want to switch the whole code base.

Question: how do React developers experience this?

Answer: we did some presentations in React conferences. Reaction is that it looks really nice, we get positive feedback. I wrote a small reference implementation of the needed backend in node, and was finished in a few days, so people can do that too.

Eric Steele: The State of Plone

published Oct 23, 2019 , last modified Oct 24, 2019

Keynote talk by Eric Steele at the Plone Conference 2019 in Ferrara.

I start with some community updates from the past year.

  • We have six new foundation members: Christine, Thomas, Fulvio, Rikupekka, Kim, Stefania.
  • We have added Zope to the Plone Foundation.
  • 35 new core Plone committers.
  • Nine funded sprints.
  • There was a sprint in Barcelona this year from the Automization and Infrastructure team. They worked dist.plone.org and other core servers, rewriting Ansible scripts. They are looking to do two or three sprints next year, and they would like more people to join. You don't need to be an expert.
  • Google Summer of Code: Gatsby and Guillotina work was done.
  • Google Season of Docs: improved Volto documentation.
  • A lot more people have been following Plone trainings. The trainings are available for free via https://training.plone.org. Right now 17 different courses available.
  • Steve McMahon is stepping down from creating Plone installers. Please step up, and he will gladly help along in a transition period.

Python 3:

  • Python 2.7 will not get security patches anymore from 1 January 2020. So for Plone it was really important to run on Python 3. That now works: first Zope was upgraded, and now Plone 5.2 runs on Python 2 and 3.
  • Archetypes is now optional. It will not work at all in Python 3, so this is your wake-up call to update your code to use Dexterity.

Other changes in Plone 5.2:

  • Plone 5.2 gives us plone.restapi. This means BYOFE: Bring Your Own Front End. So if you want, you can create an own front end to talk to Plone as backend.
  • We have dropdown navigation.
  • The login code was rewritten, to not use the old skin layers anymore, but use browser views. Much better testable.
  • Theme improvements: static resource refactoring, removed old css/js resource registries, first main content then sidebar content.
  • Integrated Products.RedirectionTool so you can manage redirects.

Other:

  • Guillotina was started as a reimagining of Plone. It is really a separate project now. New version 5 has PostgreSQL indexing and search, pub/sub, there is Guillotina CMS.
  • The new Pastanaga UI is evolving. We have a design document for Plone now.
  • Volto: React-based front end that uses plone.restapi to talk to Plone. It uses the Pastanaga UI. Dramatically simplified rich text editor. It uses modern front end development, instead of building our own tools. We are improving the learning curve by removing Plone from the learning curve. That is awkward to say for me, but it brings more people in, front enders who can customize Volto. Volto uses JSX (JavaScript Expressions), Semantic UI. Significantly faster than our current built-in front end. And Volto works on both Plone and Guillotina. It is almost on par with the existing Plone UI, but some features are missing, like content rules and several control panels.

Volto helps with decoupled development. Plone is about keeping your data safe, having migrations. That is all backend stuff. We need a measured pace for this. The front end needs to evolve much faster, which Volto can do.

Challenges and open questions.

  • Can we move to one content type, that can behave like an event, or a page or a folder? Does that solve problems or introduce new ones? Timo will talk about that.
  • UI support: how much of the classic Plone UI do we keep around? Do we put all effort into Volto instead?
  • What is "Plone" now? We have said: Plone is a CMS, a framework, a product, a community.
  • We can say: Plone is the API contract: You can use Plone on top of Zope, or you can use Guillotina, you have an API on top of that, plus a front end. But what is important is this contract: security, flexibility, extensibility, user experience.

collective.recipe.backup version 4

published Jan 25, 2018

There are lots of changes since version 3.1.

Since the end of 2017, there is a new version 4.0 of collective.recipe.backup. There are lots of changes since version 3.1. Let's see some of the highlights.

Safety and exactness of restore

  • When restoring, first run checks for all filestorages and blobstorages. When one of the backups is missing, we quit with an error. This avoids restoring a filestorage and then getting into trouble due to a missing blobstorage backup.
  • When restoring to a specific date, find the first blob backup at or before the specified date. Otherwise fail. The repozo script does the same. We used to pick the first blob backup after the specified date, because we assumed that the user would specify the exact date that is in the filestorage backup. Note that the timestamp of the filestorage and blobstorage backups may be a few seconds or minutes apart. So now the user should pick the date of the blob backup or slightly later. This date will give the same result with 3.1 and 4.0. But: when you use the new blob_timestamps == true option, these dates are the same.

Blob timestamps

  • Added blob_timestamps option. Default is false. By default we create blobstorage.0. The next time, we rotate this to blobstorage.1 and create a new blobstorage.0. With blob_timestamps = true, we create stable directory names that we do not rotate. They get a timestamp, just like the repozo backup. For example: blobstorage.1972-12-25-01-02-03.
  • When backing up a blobstorage, use the timestamp of the latest filestorage backup. If a blob backup with that name is already there, then there were no database changes, so we do not make a backup.
  • Automatically remove old blobs backups that have no corresponding filestorage backup. We compare the timestamp of the oldest filestorage backup with the timestamps of the blob backups. This can be the name, if you use blob_timestamps = true, or the modification date of the blob backup. This means that the keep_blob_days option is ignored, unless you use only_blobs = true.
  • Note: it is fine to switch to blob_timestamps even when you already have 'old' backups. Restoring those will still work.
  • blob_timestamps = true may become the new default later (maybe 4.1). This may even become the only valid value later (maybe 5.0), removing the creation of blobstorage.0. This would simplify the code. If you don't like this, please speak up and create an issue.

Archiving and compressing blobs

  • Renamed gzip_blob option to archive_blob. Kept the old name as alias for backwards compatibility. This makes room for letting this create an archive without zipping it.
  • Added compress_blob option. Default is false. This is only used when the archive_blob option is true. When switched on, it will compress the archive, resulting in a .tar.gz instead of a tar file. When restoring, we always look for both compressed and normal archives. We used to always compress them, but in most cases it hardly decreases the size and it takes a long time anyway. I have seen archiving take 15 seconds, and compressing take an additional 45 seconds. The result was an archive of 5.0 GB instead of 5.1 GB.
  • Note that with both archive_blob and blob_timestamps set to true, you get filenames like blobstorage.1972-12-25-01-02-03.tar.
  • Added incremental_blobs option. This creates tarballs with only the changes compared to the previous blob backups. This option is ignored when the archive_blob option is false.

Various

  • No longer create the fullbackup script by default. You can still enable it by setting enable_fullbackup to true.
  • Added Python 3 support. The integration with plone.recipe.zope2instance is not tested there, because there is no Python 3 compatible release of it yet.

Upgrading

  • In most cases you can simply use the new version without changes.
  • Adding blob_timestamps = true is highly recommended. If you do this, you can remove the keep_blob_days option, unless you use only_blobs = true.
  • If you want the fullbackup script, enable it by setting enable_fullbackup to true.
  • When you used the gzip_blob option, you should rename this to archive_blob. Maybe enable the compress_blob option, but you are probably better off without this.

Python Meetup 22 November 2017

published Nov 22, 2017

Summary of the meeting of the Amsterdam Python Meetup Group on 22 November 2017.

The meeting was hosted by Byte. Thank you!

Byte is not really hosting anymore, moving to the cloud. We use lots of Python. Magento. Creating our own service panel. We use tools like Django, SQLAlchemy, Celery, etcetera, so we like to learn from you.

Wouter van Atteveldt: Large scale search and text analysis with Python, Elastic, Celery and a bit of R

I got my PhD in artificial intelligence. I am now a social scientist.

Why do you need text analysis? Example. In 2009 Israel and Palestina were fighting. How did the media report? I downloaded lots of articles and did text analysis:

  • U.S. media: lots about Hamas and their rocket attacks that provoked Israel.
  • Chinese media: more about the Gaza strip, actions on the ground by Israel.

So you can learn from text analysis. There is a flood of digital information, and you cannot read it all, so you use text analysis to get an overview. You need to go from text to structured data.

Facebook research. They changed the timelines of 600,000 people, some were shown more positive messages, some more negative. More positive messages resulted in less negative messages written by those people, but also less positive messages [if I saw it right]. Lots of things wrong with this research.

We can do much better, with Python. Python is platform independent, easy to learn. Half the data science community is in Python.

Or we can do stuff with the R language. It is aimed at statistics and data. There is convergence via numpy and pandas and R packages.

I helped create https://amcat.nl/ where you can search data, for example finding articles about Dutch politics in the Telegraaf. You can search there for 'excellentie', and see that this Dutch polite term for ministers was used until the sixties, and then resurfaced recently when a minister wanted to reintroduce this, and got satirical comments.

AmCat is a text search front end and an API, written in Python and R scripts and ElasticSearch. We upload texts to elastic on two separate nodes. You can never change an article: they are unique and hashed, so you get the same results when you search again next year.

For the Telegraaf paper, there is no data between 1995 and 1998. The previous years were digitised by a library, and the later years by Telegraaf themselves, but the current owner is not interested in filling the gap.

Our script interface is a Django form plus a 'run' method. This means our CLI, HTTP and API front end can use the same code.

For async jobs we use Celery. I know a professor who likes to write a query of several pages, which can take more than an hour to handle, so he would normally get a timeout. So we do that asynchronously.

We want to do NLP (Natural Language Processing). Or preprocessing. There are many good tools, like Stanford CoreNLP for English. We developed NLPipe, a simple NLP job manager. It caches results. Separated into server and workers. Workers can run on AWS via Docker, but since we have no money we are looking at Surf HPC but they don't support Docker, so we look at Singularity instead. Experts welcome.

Reinout van Rees: Fast querying and filtering with Django

[Same talk as on PyGrunn this year. I copied my summary and extended it.]

Goal: show what is possible. Everything is in the Django documentation. Just remember a few things you see here. If you know it is available, you can look it up.]

The example case I will use is a time registration system. Everyone seems to do this. Oh, less hands here than at PyGrunn. The tables we will use are person, group, project and booking. A Person belongs to a Group. A Booking belongs to a Project and a Person.

The Django ORM gives you a mapping between the database and Python. You should not write your own SQL: Django writes pretty well optimised SQL.

Show all objects:

from trs.models import Person, Project, Booking
Person.objects.all()

Basic filtering:

Person.objects.filter(group=1)

specific name:

Person.objects.filter(group__name='Systemen')

case insensitive searching for part of a name:

Person.objects.filter(name__icontains='reinout')

or part of a group name:

Person.objects.filter(group__name__icontains='onderhoud')

name starting with:

Person.objects.filter(name__startswith='Reinout')

without group:

Person.objects.filter(group__isnull=True)
Person.objects.exclude(group__isnull=False)

Filtering strategy:

  • sometimes .exclude() is easier, the reverse of filter
  • you can stack: .filter().filter().filter()
  • query sets are lazy: only really executed at the moment you need it. You can use that for readability: just assign the query to a variable, to make complicated queries more understandable
  • start with the model you want

Speed:

This will use one initial query and then one extra query for each person:

systems = Person.objects.filter(group__name='Systemen')
for person in systems:
    print(person.name, person,group.name)

Not handy. Instead use the following.

  • select_related: does a big join in SQL so you get one set of results:

    for person in systems.select_related('group'):
        print(person.name, person,group.name)
    

    This does one query.

  • prefetch_related: does one query for one table, and then one query to get all related items:

    for person in systems.prefetch_related('group'):
        print(person.name, person,group.name)
    

    This does two queries. Both can be good, depending on the situation.

  • It is expensive to instantiate a model. If you need only one or two fields, Django can give you a plain dictionary or list instead.

Dictionary:

systems.values('name', 'group__name')

List of several fields in tuples:

systems.values_list('name', 'group__name')

Single list of values for a single field:

systems.values_list('group__name', flat=True)

Annotation and aggregation:

  • annotate: sum, count, avg
  • aggregation
  • groupby via values (bit of a weird syntax)

Aggregation gives totals:

from django.db.models import Sum
relevant_persons = Booking.objects.filter(
    booked_by__group__name='Systemen')
relevant_persons.aggregate(Sum('hours'))

Annotation adds extra info to each result row:

relevant_persons.annotate(
  Sum('bookings__hours'))[10].bookings__hour__sum

Filter on bookings for maternity leave, group bookings by year, give sums:

Booking.objects.filter(
    booked_on__description__icontains='ouderschap'
).values('booked_by__name', 'year_week__year'
).annotate(Sum('hours'))

Note: I used the faker library to replace the names of actual coworkers in my database with random other names.

Practice this with your own code and data! You'll get the hang of it and get to know your data and it is fun.

What I hope you take away from here:

  • Read the docs, you now have the overview.
  • Make your queries readable.
  • Practice, practice, practice.

[I will toss something extra in from PyGrunn, which was probably a question from the audience.]

If you need to do special queries, you can create a sub query yourself:

from django.db.models import Q
query = Q(group__name='Systemen')
Person.objects.filter(query)

You can write filters that way that are not in default Django.

Twitter: @reinoutvanrees

Wojtek Burakiewicz: Building robust command line tools with click and flask (and microservices)

I worked at Byte and am now working at Optiver, in the tooling team, using Python for data center management. I use Flask and Click, made by the same people. Our sd tool is internal, it is not on the Internet. It deploys Python applications to servers with virtualenv.

On the server side we us stash, artifactory, ansible+Jenkins, supervisord, JIRA. On the user side we have our sd tool to talk to the programs on the server.

You would normally need passwords on some servers, auth keys or ssh keys on others. Some ports open, a different one for each new app. Messy.

So sd talks to an api that sits in between. The api then handles the messy talking to the servers, and you store authentication and port numbers in a central config, instead of on the computer of each user. All users talk the the api, the api talks to the servers.

For the server side api we use flask and flask_restful. For the client side we use click.

When you install Django, you get a house. When you install Flask, you get a brick. Sometimes you want a house. Other times all you need is a brick. You can build something entirely different with a brick. It is easy.

Making an api is done easily with:

from flask_restful import Api, Resource

Then click for the command line tool. I always struggle with argparse, but I like working with click:

import click

@click.command()
@click.option('--count', default=1, help='...')

Click makes you structure your application in a nice way:

tool/
  cli.py
  actions/
    action1.py
    action2.py

We use a trick to force users to upgrade. With every request to the api we send the version of our cli. The api checks this against a minimum version and aborts if the version is too old.

Documentation: