Holger Krekel - re-inventing Python packaging & testing

published May 10, 2013

Holger Krekel gives the keynote at Pygrunn, about re-inventing Python packaging and testing.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I am @hpk42 on Twitter.

I started programming in 1984. I am going to tell you how distribution and installation worked that day, as you are too young to know. Me and a friend would sit down after school and take a magazine. One of us would read some hexadecimal numbers from it and the other typed it in. One and a half hour later we could play a pacman game.

Apprentice: "Can anyone tell me why X isn't finished?"

Master: "It takes a long time to write software."

Projects take time. CPython is 22 years old.

Where do all these efforts go into? Into mathematical algorithms? No. Deployment takes a huge bite. Software needs to run on different machines, needs to be configured, tested, packaged, distributed, installed, executed, maintained, monitored, etcetera.

The problem with deployment is the real world. Machines are different, users are different, networks are different, operating systems are different, software versions are different.

There are producers of software. If as a producer I change an API or a UI that creates a danger for my users. This means releasing a new version is dangerous, because for the users deploying the new version is potentially dangerous.

A lot can be solved by automation. Automated tests help. You need to communicate (allow users to track changes, have discussions). Configurations should be versioned so you can go back to earlier versions or at least see what the difference is. You need a packaging system and a deployment system. This may be more important than choosing which language to use.

The modern idea to simplify programming is usually: let's have only one way so it is clear for everyone what to do. Oh, and it should be my way.

Standardization fosters collaboration, even if the standard is not perfect. But tools that come out of this standardization are more important than the standardization document itself.

Are standardized platforms a win? For example 64/Amiga, iOS, Android, Debian, .NET, company wide choices for virtual machines and packaging. This reduces complexity, but increases the lock-in. You may not want to bet your whole business on one platform.

Modernism: have one true order. For example, Principia Mathematica for having one system of mathematics that could do everything. Gödel proved this was impossible.

Let's check the koans of Perl and Python. Perl says there is more than one way to do it. Python says there should be one - and preferably only one - obvious way to do it. Both say there are multiple ways. You need to take that into account.

A note on the Python standard library: Python includes lots of functionality. This was a good idea in the past. Today, PyPI often provides better APIs, and we can still improve it.

Perl has the CPAN, Comprehensive Perl Archive Network. Lots of good structure in there.

Python is still catching up. Python is growing declarative packaging metadata instead of in the Python setup.py file. Trying to standardize on pip and wheels, but easy_install remains a possibility. Uploading or PyPI server interaction today is hard. The server is hard to deploy on a laptop. There are no enforced version semantics. It has a brittle protocol. It is hard to move away from setup.py though.

http://pypi-mirrors.org lists about eight mirror of the official http://pypi.python.org server. Most are not up to date or even not updating at all. Not good.

Perl and Python are both not living up to their koans. Python has lot to improve.

What needs to be improved? setuptools and distribute are being merged. The bandersnatch tool is being deployed, which is much better and faster for mirroring. Several PEPs are being discussed and considered. The people proposing these PEPs are talking to each other, so communication is good. New version comparison, new packaging metadata version, new rules on PyPI, etcetera. A lot is happening.

We should be aware of the standardization trap: you try to solve the five existing ways of doing something by adding a sixth way. To avoid this, don't demand that the world changes first before your tool or idea can be used. To a certain degree Python fell into that trap, but that is outside the scope for this talk.

I would like to focus on integration of meta tools. These can configure and invoke existing tools and make them work for most use cases. You can enable and facilitate new technology there.

Testing

Python has lots of testing tools, like nose, py.test, unittest, unittest2, zope.testing, make test, setup.py test.

tox is a "meta" test running tool. Its mission is to standardize testing in Python. It is a bit like a Makefile. It runs the tests with the tools of your choice. It acts as a front-end to CI servers. See http://tox.testrun.org for details.

travisci (Travis CI) is a "meta" test running service. It configures all kinds of dependencies, priming your environment.

devpi

I have a new idea, devpi: support packaging, testing and deployment. The devpi-server part is a new compatible Python index and upload service. The client part has sub commands for managing release and QA workflows.

Why a new index server? In the existing solutions, I missed an automatically tested extensible code base, or other parts.

devpi-server is self-updating. It is a selective mirror. It does not try to update all packages on the original PyPI, just the ones that you actually use.

But: working with multiple indexes is burdensome. You can use devpi to provide "workflow" subcommands. use to set the current PyPI index. upload to build and upload packages from a checkout. test to download and test a package. So you can create a package, upload it to a local test PyPI, test the package and then upload it to the real PyPI.

I did the last pytest releases using devpi.

Development plans: MIT licensed, test driven development. Get early adopters.

The main messages from this talk:

  • Evolve and build standards, do not impose them.
  • Integrate existing solutions, do not add yet another way, if possible.
  • Let's share this tooling and collaborate. Maybe you have some tool to reliably create a Debian package from a Python package. Make it available and get feedback and code from others.

Strive for something simpler, see the requests library. Simplicity is usually something that emerges by using a piece of software.