Armin Ronacher: I am doing HTTP wrong

published May 11, 2012

Armin Ronacher has a fresh look at HTTP from Python, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

Most Python web frameworks approach HTTP in the same way. For a project I started looking at it in a more basic way, wondering if the lower TCP level would be enough.

Usually you have a request object, you return a response object, and some other layers (like in WSGI) may change some things along the way. Why don't we abstract that more? What do we like about HTTP?

  • It is plain text.
  • You can use the REST concept.
  • Content negotiation (returning html or json).
  • Incredibly well supported: every browser supports it.
  • It works where TCP does not. In some cases you cannot use TCP, but can use HTTP, like Windows mobile before version 7.
  • It is somewhat simple to implement.
  • You can upgrade to custom protocols when needed.

You can stream HTTP or buffer it up. In Python the headers will always be buffered up in WSGI. Same for headers, forms, files. When all has been gathered and is ready, the HTTP is streamed back to the client.

When your server has started reading the data of a request, it can no longer tell the client to stop sending. You could reject too large requests. But what is large? A form of 16 MB that needs to be loaded in memory may be too large. A big half gigabyte file that is directly stored on disk, hardly using any memory in the process, may be fine. So lots of Python web frameworks are vulnerable to attacks by sending it big requests.

A new approach

Dynamic typing has made us lazy. We treat HTTP as an implementation detail.

For a web app we have defined schemas so we can push around data in our application between client and server, without needing keys in request parameters.

Be strict in what you send, but generous in what your receive; variant of Postel's law. When you do that, you need to know what you recieve and know if you can trust it.

A GET request has no body, so parameters have to be URL encoded. So there is inconsistency with JSON POST requests.

How do we handle streaming data? We don't. For things that need actual streaming, like file upload or send, we have separate end points. Streaming is really very different, but we can stream until we need buffering.

We can discard useless stuff, keys that users send us but that we will not handle.

Modern web apps are APIs. Separate from that you have the front-end. Do you need to support 'dumb' clients? Move the client logic to the server, for example handle javascript on the server.

We are currently using this for an iOS game.

Oh hai, we are hiring: