Pau Freixes Alio: Running aioHTTP at scale

published Oct 19, 2017

Talk by Pau Freixes Alio at the Plone Conference 2017 in Barcelona.

At Skyscanner we use lots of microservices for hotels. Microservices talk to each other with HTTP. The languages that we use all know how to talk HTTP.

We need to know what is happening in our micro services, like: which requests take more than 20 seconds? With aioHTTP middlewares we added code to check this. We make this visible in Kibana.

We wanted to follow the code path taken by a request. aiotask-context stores information within the current asyncio.task instance. The request id is stored on the task by a middleware. That way we can follow a request through the code.

DNS in AWS with aioHTTP. DNS TTL is usually 60 seconds in AWS, so IP addresses can change, and it can be more addresses. But aioHTTP versions below 2 did not support this: they cached the DNS requests. We created code to handle this and cache this for only a short time. And we throttled the DNS events, to avoid querying the DNS a hundred times when you fire a hundred requests very quickly after each other.

We call microservices with a timeout of one second, and catch the error. Timeouts can get triggered when the reactor is saturated. We use asyncio.Future and can cancel such a future when we detect a timeout.

Desired plans:

trace queued operations when the HTTP pool connection limit is reached
AWS Xray support
Back pressure at the HTTP layer. When the reactor is too busy, return a 504 error. Scale horizontally when you get lots of 504 errors.

See the slides.

plone ploneconf2017