Thomas Schorr: Pyruvate WSGI Server Status Update

published Oct 27, 2021

Talk by Thomas Schorr at the online Plone Conference 2021.

At last year's Plone conference, I presented Pyruvate, a WSGI server implemented in Rust (and Python). Since then, Pyruvate has served as the production WSGI server in a couple of projects. In this talk I will give a project status update and show how to use Pyruvate with Zope/Plone and other Python web applications and frameworks. I will also present some use cases along with benchmark results and performance comparisons.

WSGI is the Python webserver gateway interface. It is the default way for Python apps to talk with a webserver.

During the Python 3 migration of Zope and Plone, WSGI replaced ZServer as the default, since Plone 4. The ZODB is not thread-safe. This means there is a limited choice of WSGI servers. The Zope docs recommend only two: waitress and Bjoern. Other popular servers showed poor performance for Zope.

Rust is a new, popular programming language. It can also be used to extend Python packages. The cryptography package does this, and we all use this package.

Pyruvate from a user perspective:

pip install pyruvate

Then create an importable Python module:

import pyruvate

def application(environ, start_response):
    """WSGI application"""
    ...

pyruvate.serve(application, "0.0.0.0:7878", 3)

Using pyruvate with Zope/Plone. In buildout you add pyruvate to the eggs of your instance, and use a different wsgi-ini-template.

  • Pyruvate supports active Python versions, currently 3.6-3.10.
  • Using Mio to implement I/O event loop.
  • Worker pool based on threadpool, 1:1 threading.
  • A PasteDeploy entry point.
  • It is stable since 1.1.0.
  • Supports Linux and MacOS
  • Hosted on Gitlab
  • There are binary packages for Linux, so there you don't need a Rust compiler to install it.

Let's see about performance. As starting point: Performance analysis of WSGI servers by Omed Habib, see Appdynamics blog. This is a performance comparison of 6 WSGI servers, published in 2016. Tested were: Bjoern, CherryPy, Gunicorn, Meinheld, mod_wsgi and uWSGI.

Benchmarking tool was wrk. The test WSGI application simply returns some headers, and the text "OK".

I changed the original setup. Swapped Meinheld and mod_wsgi for Pyruvate and Waitress, swapped CherryPy for Cheroot. Use Python 3 only. Now we look at the metrics.

Number of requests served:

  • Bjoern is by far the best.
  • Pyruvate does not do so well with lower number of simultaneous connections, but with higher numbers, it jumps to second place after Bjoern.

CPU usage:

  • Bjoern is single threaded, so is 100 percent, the others can do more.
  • Gunicorn starts the best, but drops in performance with more simultaneous connections.
  • Pyruvate starts slightly below it, but sustains it better.

Memory usage:

  • More or less okay for all.
  • Except uWSGI, where memory usage steadily goes up.

Errors:

  • With increasing load, all servers show errors on higher load.
  • Except waitress and Pyruvate.
  • uWGI always shows errors for every single request, so something is wrong. But it still serves the request. So maybe an issue of the benchmarking tool.

Why is Bjoern so much faster?

  • Good implementation, many optimizations.
  • It is single threaded. Switching from single to multi-threaded comes with benefits and costs.
  • Shared access to resources adds to complexity.
  • Offloading requests to worker threads only makes sense when there is something to work on, which is not the case in this benchmark.
  • Python's GIL generally makes multithreading less effective.

Let's look at benchmarking Plone 5.2.5. Testing with Bjoern, Pyruvate and Waitress. All three serve the Zope root about 600 requests per second, and this stays the same when simultaneous connections increase. Serving the Plone root: all three about 27 requests per second.

Number of errors, mostly socket errors:

  • Bjoern starts giving errors a bit earlier, starting from ten simultaneous connections, but it holds up well.
  • Pyruvate and waitress start giving errors at 50 simultaneous connections.
  • With 100, Pyruvate does a lot worse than Bjoern, and waitress even more than that.
  • So you really can't have that many simultaneous requests to Plone.

CPU usage is the same, all 100 percent. In this setup I have used one worker for each server.

Serving /Plone as json: Bjoern is slightly better than Pyruvate, which is slightly better than waitress, but all are around 500 requests per second, quite close to each other.

The most interesting is number of requests served when getting a 5.2 MB blob from blobstorage. Bjoern is around 200, Pyruvate 180, so close, and waitress a lot worse, dropping from about 80 to 40.

Next setup: 2 threads for Pyruvate and waitress. Pyruvate is then better than Bjoern. Waitress starts better, but cannot keep up.

Using 2 threads but 4 CPU, Both Pyruvate and waitress are a bit better than Bjoern, and keep it up.

Conclusions from benchmarking:

  • Bjoern is the winner when using a single worker for all URLs except /Plone.
  • When serving a more complex page such as /Plone, there is no real difference in the number of requests served, but Bjoern is showing errors a bit earlier.
  • Adding one thread plus sufficient resources lets both Pyruvate and waitress perform better than Bjoern.
  • All configurations failed to sustain higher loads, more than 50 connections.
  • Bjoern and Pyruvate are serving blobs a lot faster than waitress.
  • Pyruvate can challende waitress in all scenarios.
  • When adding worker threads, Pyruvate seems to make better use of resources than waitress.

As test I setup Apache for fair balancing between two ZEO clients on Plone 5.2, one served by Pyruvate, one by waitress. Consistently more requests are sent to Pyruvate (53 percent).

Code: https://gitlab.com/tschorr/pyruvate