Thomas Schorr: Pyruvate WSGI Server Status Update

published Oct 27, 2021

Talk by Thomas Schorr at the online Plone Conference 2021.

At last year's Plone conference, I presented Pyruvate, a WSGI server implemented in Rust (and Python). Since then, Pyruvate has served as the production WSGI server in a couple of projects. In this talk I will give a project status update and show how to use Pyruvate with Zope/Plone and other Python web applications and frameworks. I will also present some use cases along with benchmark results and performance comparisons.

WSGI is the Python webserver gateway interface. It is the default way for Python apps to talk with a webserver.

During the Python 3 migration of Zope and Plone, WSGI replaced ZServer as the default, since Plone 4. The ZODB is not thread-safe. This means there is a limited choice of WSGI servers. The Zope docs recommend only two: waitress and Bjoern. Other popular servers showed poor performance for Zope.

Rust is a new, popular programming language. It can also be used to extend Python packages. The cryptography package does this, and we all use this package.

Pyruvate from a user perspective:

pip install pyruvate

Then create an importable Python module:

import pyruvate

def application(environ, start_response):
    """WSGI application"""
    ...

pyruvate.serve(application, "0.0.0.0:7878", 3)

Using pyruvate with Zope/Plone. In buildout you add pyruvate to the eggs of your instance, and use a different wsgi-ini-template.

Pyruvate supports active Python versions, currently 3.6-3.10.
Using Mio to implement I/O event loop.
Worker pool based on threadpool, 1:1 threading.
A PasteDeploy entry point.
It is stable since 1.1.0.
Supports Linux and MacOS
Hosted on Gitlab
There are binary packages for Linux, so there you don't need a Rust compiler to install it.

Let's see about performance. As starting point: Performance analysis of WSGI servers by Omed Habib, see Appdynamics blog. This is a performance comparison of 6 WSGI servers, published in 2016. Tested were: Bjoern, CherryPy, Gunicorn, Meinheld, mod_wsgi and uWSGI.

Benchmarking tool was wrk. The test WSGI application simply returns some headers, and the text "OK".

I changed the original setup. Swapped Meinheld and mod_wsgi for Pyruvate and Waitress, swapped CherryPy for Cheroot. Use Python 3 only. Now we look at the metrics.

Number of requests served:

Bjoern is by far the best.
Pyruvate does not do so well with lower number of simultaneous connections, but with higher numbers, it jumps to second place after Bjoern.

CPU usage:

Bjoern is single threaded, so is 100 percent, the others can do more.
Gunicorn starts the best, but drops in performance with more simultaneous connections.
Pyruvate starts slightly below it, but sustains it better.

Memory usage:

More or less okay for all.
Except uWSGI, where memory usage steadily goes up.

Errors:

With increasing load, all servers show errors on higher load.
Except waitress and Pyruvate.
uWGI always shows errors for every single request, so something is wrong. But it still serves the request. So maybe an issue of the benchmarking tool.

Why is Bjoern so much faster?

Good implementation, many optimizations.
It is single threaded. Switching from single to multi-threaded comes with benefits and costs.
Shared access to resources adds to complexity.
Offloading requests to worker threads only makes sense when there is something to work on, which is not the case in this benchmark.
Python's GIL generally makes multithreading less effective.

Let's look at benchmarking Plone 5.2.5. Testing with Bjoern, Pyruvate and Waitress. All three serve the Zope root about 600 requests per second, and this stays the same when simultaneous connections increase. Serving the Plone root: all three about 27 requests per second.

Number of errors, mostly socket errors:

Bjoern starts giving errors a bit earlier, starting from ten simultaneous connections, but it holds up well.
Pyruvate and waitress start giving errors at 50 simultaneous connections.
With 100, Pyruvate does a lot worse than Bjoern, and waitress even more than that.
So you really can't have that many simultaneous requests to Plone.

CPU usage is the same, all 100 percent. In this setup I have used one worker for each server.

Serving /Plone as json: Bjoern is slightly better than Pyruvate, which is slightly better than waitress, but all are around 500 requests per second, quite close to each other.

The most interesting is number of requests served when getting a 5.2 MB blob from blobstorage. Bjoern is around 200, Pyruvate 180, so close, and waitress a lot worse, dropping from about 80 to 40.

Next setup: 2 threads for Pyruvate and waitress. Pyruvate is then better than Bjoern. Waitress starts better, but cannot keep up.

Using 2 threads but 4 CPU, Both Pyruvate and waitress are a bit better than Bjoern, and keep it up.

Conclusions from benchmarking:

Bjoern is the winner when using a single worker for all URLs except /Plone.
When serving a more complex page such as /Plone, there is no real difference in the number of requests served, but Bjoern is showing errors a bit earlier.
Adding one thread plus sufficient resources lets both Pyruvate and waitress perform better than Bjoern.
All configurations failed to sustain higher loads, more than 50 connections.
Bjoern and Pyruvate are serving blobs a lot faster than waitress.
Pyruvate can challende waitress in all scenarios.
When adding worker threads, Pyruvate seems to make better use of resources than waitress.

As test I setup Apache for fair balancing between two ZEO clients on Plone 5.2, one served by Pyruvate, one by waitress. Consistently more requests are sent to Pyruvate (53 percent).

Code: https://gitlab.com/tschorr/pyruvate

Tiberiu Ichim: Volto slots, portlets on steroids

published Oct 27, 2021

Talk by Tiberiu Ichim at the online Plone Conference 2021.

Problem: Volto has no equivalent of a viewlet. Solution: slots. They can be management slots, presentation slots, below-footer slots.

One reason: we try not to customize the main template.

Volto also does not have portlets. Well, if you really want them badly enough, you can have them. There is a PR in plone.restapi to export portlets, so you could render them in Volto.

Idea: reuse Volto blocks for layout.

Plone has had portlets for a long time, and it is very useful, especially for smaller sites. You should not have to be a web developer to change the website layout. Portlets give site administrators some power to influence the look of their own site. We should keep that possibility.

Volto's slot proposition:

Simplify configuration. Portlets in Classic need too many files.
Volto blocks are very expressive.
Require Modify Portal Content permission for slots.
UI Power: give more capabilities: - atomic blocking of parent blocks - override parent blocks

How can we use them?

Sidebars: listings, info boxes, navigation
section headers, content

Current status: big PRs on plone.restapi and Volto. Overall the basic functionality is 60-70 percent ready. I will do a live demo.

Fred van Dijk: collective.exportimport: tips, tricks and deploying staging content

published Oct 27, 2021

Talk by Fred van Dijk at the online Plone Conference 2021.

collective.exportimport is the latest addition to our migration toolbox and we have achieved good results with it with upgrading Plone sites to Plone 5.2. But A new 'side' use case with this add-on getting mature is distributing content trees between existing Plone sites. For example to create an online marketing campaign and deploying the setup to several country sites for translation and local use. I will demonstrate the 'content copy' use case, discuss current state and planned/wished improvements. As a related subtopic I will also touch on current capabilities and 'caveats' of exportimport when using this for migrations based on our current experience.

I will do exactly the same talk as Philip, but in my way and in half an hour.

As Philip said: "Migrations can be 'fun' again."

I started work on a migration in autumn of 2020. The Plone 4.3 site originally started in Plone 3, maybe even some left-overs from 2.5. I did all the usual stuff for inline migration to 5.2 on Python 3. And I found a banana peel. And another. And another. With such an old site, that has been migrated over and over, too many things can be lurking in the database that at some point bite you.

The main drawback of in-place migration: there are unknown unknowns. You never know how many dark things are still lurking in there until you have completely finished the migration. So: very hard to estimate.

You also require an 'intermediate' migration environment: you go from 4.3/Py2 to 5.2/Py2 to 5.2/Py3.

A lot of work has been done on in-place migration. It is stable. It works on standard Plone sites. But who really has a standard Plone site?

collective.exportimport uses ETL transformation. Transmogrifier uses the same idea/theory:

you Export
you Transform
you Load

Actually, you often export, load, then transform/fix. So ELT/ELF. Benefits:

No need for an intermediate Py2/3 environment.
You don't touch your existing/old environment data.
You only need to add collective.exportimport to your old site.

Now some technical tidbits.

There are several chicken-and-egg problems.

You import a folder, this has a default page, but the page does not exist yet.
Page A relates to Page B which does not exist yet.

So we import all content, and then import other stuff like default pages and relations afterwards.

Part 2 of this talk: Copying staging content.

Use case from a customer, Zeelandia. This customer has subsidiaries in several countries, also using Plone Sites. They wanted to create some content on one site and make it available on the other sites so local editors can adapt it to their language.

With content-tree support in exportimport, we can export a folder and import it in another site. This works! Except: this site has Mosaic, and we use persistent tiles.

Tile data is either stored:

in a urlencoded part on the html field
on the context of the item (plone.app.standardtiles)
as a persistent mapping annotation on the context.

Problem: tile annotations are not yet exported. I asked my colleague Maurits: Can we fix this? Yes we can. Demo. We export the content tree to a shared directory that is accessible to all Plone Sites on the server. They use this now to export marketing compaigns.

We export the annotations using the dict_hook mechanism to check if the context has a Mosaic layout. We need some adapters/converters for richtext fields, and named files and images when they are in tiles. The context is not a content item, but a dictionary, that is why we need these extra converters. It would help to know the actual schema of the tile, and there should be ways for that, so we could improve this. Missing from tiles is: support for RelationValues.

Could we define a generic 'bundle' json Plone export format? Then we could export the content plus default pages plus relations, etcetera, in one json. And then we can import it in one go in the correct order.

Tips and tricks:

Keep a log of what you do when, in which order, with number of items processed.
collective.migrationhelpers has some fixers, which are being moved to exportimport.
You can use collective.searchandreplace to fix most css class changes by using smart regexps.
Check disk space, space in TMP.
After import, do basic checks, like: can I edit content, especially rich text.

Philip Bauer: A new hope for migrations and upgrades

published Oct 27, 2021

Talk by Philip Bauer at the online Plone Conference 2021.

I've often argued for in-place migrations and worked hard to make them as easy as possible. The thing is: They are still hard. Especially when you add Archetypes/Dexterity, Python 2/3, Multilingual, old Add-ons and Volto to the hurdles to overcome. All of this changed when early this year a new star was born. It has started as the idea to build a small wrapper around serializing and deserializing content using the REST-Api. Since then collective.exportimport has grown into a powerful tool with a beginner-friendly user-interface. I will show how it is used for all kinds of small and large migrations and how it can solve even the edgiest edge-cases.

I was tempted to call this talk: a new default for migrations and upgrades. But we should discuss that.

I have given a lot of talks about migrations or upgrades for Plone during the years. The two options so far were:

in-place migrations (the default)
transmogrifier

What if you could sidestep all those migration steps, and go to your target Plone version in one go? Prompted by Kim Nguyen, I started work on collective.exportimport.

See: https://github.com/collective/collective.exportimport

You can export:

Plone 4, 5, 6
Archetypes and Dexterity
Python 2 and 3
plone.app.multilingual and Products.LinguaPlone

Import:

Plone 5.2 and 6
Python 2 and 3
Dexterity

You can use this for migrations, but also for exporting just a part of your Plone Site.

It is built around plone.restapi. This tool takes care of most of the subtle details that might otherwise go wrong if you write something yourself from scratch. The Rest API also makes it easy to customize parts of this, especially serializers and deserializers.

Since version 1.0 of exportimport came out, we have:

export a complete site, instad of one content type at a time
export trees
export portlets
more options for blobs

Demo.

Since spring this year, we added support for "big data":

The export uses generators, and writes one item at a time, so you don't run out of memory on large sites.
You can export/import blob-paths so you can use the original blobstorage.
For import we use ijson to efficiently load large json files, again to avoid running out of memory.
Commit after X items (work in progress)

Exporting a 10 GB Data.fs with 82 GB blobs, resulted in a 643 MB Plone.json file, plus much smaller other json files. Exporting took 30 minutes for the content, and 20 minutes for the other stuff, like portlets. Import took 6 hours for content, 1 hour for all others. That is on my current computer. It would actually be slower, but I disable versioning on initial creation, which helps.

It is customizable. I decided to not go for the Zope Component Architecture. You subclass the base class and use some hooks. Example of a hook to turn an old HelpCenter class into a standard folder:

def dict_hook_helpcenter(self, item):
    item["@type"] = "Folder"
    item["@layout"] = "listing_view"
    return item

TODO:

Fix html. Between Plone 4 and 5, html in richtext fields has changed a lot. For example, we need to add extra data attributes to link tags, otherwise they are not editable in TinyMCE. Also, the scales have changed, which needs some changes in html. This is work in progress.
Migrate plone.org.
Migrate Classic to Volto: html to draftjs/slate, tiles to blocks.

I am not sure if collective.exportimport should be the new default. Usually in-place migration is fine, just not so much when migrating to Dexterity or to Python 3.

Watch the next talk by Fred, who will talk about more details.

Katie Shaw: From Opaque to Open: Untangling Apparel Supply Chains with Open Data

published Oct 27, 2021

Keynote talk by Katie Shaw at the online Plone Conference 2021.

The intro from the conference website:

The tragic Rana Plaza building collapse in 2013 revealed to the world how little many global brands knew about where their products were being made. Following the disaster, demand for supply chain disclosure in the apparel sector was heightened. However, in response to these calls for greater transparency, supply chain disclosure has been inconsistent, inaccessible, of poor and varied quality, and stored in siloed databases. The Open Apparel Registry (OAR) was built to address all these data challenges. At its heart, the OAR exists to drive improvements in data quality for the benefit of all stakeholders in the apparel sector. Powered by a sophisticated name- and- address-matching algorithm, the tool creates one common, open registry of global facility names and addresses, with an industry standard facility ID.

Join us to learn more about the challenges facing the apparel sector, including low levels of technical exposure and understanding of open data; collaborative work that’s being done to educate the sector on the power of open data, including the launch of the Open Data Standard for the Apparel Sector (ODSAS), and examples of how data from the Open Apparel Registry being freely shared and used is creating meaningful changes in the lives of some of global society’s most oppressed people.

[Note from Maurits: apparel is a fancy word for clothes. I had to look it up.]

So what is the trouble with fashion anyway? Not high fashion, I just mean your daily clothes.

Fashion generates 2.5 trillion dollar in global annual revenues. Two of the richest people in the world are fashion industry magnates. "It takes a garment worker 18 months to earn what a fashion brand CEO earns during lunchtime."

In 2013 a garment factory collapsed in Dhaka. Workers were forced to go to work earlier on that day, although many saw cracks in the building.

Supply chains are complex. The label in your T-shirt may say "made in Vietnam" but parts of it may have been made in a completely different location. Maybe simply the buttons are from an entirely different country.

How can better and open* data help? There are databases with addresses of factories, that we cannot match to an actual address. Visiting the factory to check working conditions, or train people, is not possible if you cannot even find the building.

See the tool at https://openapparel.org

Each unique facility in the OAR (Open Apparel Registry) is allocated an ID number.

Technically, the biggest issue was: the data. (It's always the data...)

no industry-wide standards
often extracted from PDF
non-structured addresses (5 kilometers after the post office)

We had 50,000 existing facilities. Any new facility uploaded needed to be checked: maybe it is already there. This took far too long. We now use the dedupe Python package for this, which uses fuzzy matching. Much faster.

We made an Open Data Standard for the Apparel Sector (ODSAS). We call for Open Data Principles in EU corporate sustainability reporting directive legislation.

Sign on and join: Clean Clothes Campaign, Open Appparel Registry, WikiRate, and more.

I want to share stories of the OAR in action.

Clean Clothes Compaign, where Plonista Paul Roeland works, is a global alliance dedicated to improving working conditions of workers in the apparel supply chain. It uses the OAR data in its Urgent Appeal work, in which it responds to concrete violations reported by workers and unions. For example, a union leader was sacked, but after an appeal by CCC he was restored after five days.

Our data is used to map which apparel facilities will be underwater in 2030. Researchers combined our data with sea level projections from the climate panel. The OAR provided a unique data-set for this work.

See our code here: https://github.com/open-apparel-registry

It's time to untangle supply chains!

For further reading, there are also books, especially Fashionopolis by Dana Thomas.

BHRRC used our data in a case where workers did not get paid. In our data they found which brands were using this factory, they contacted them, and the case got solved.