Thijs Feryn: Varnish caching in Plone
Talk by Thijs Feryn at the Plone conference 2024 in Brasilia.
Link to talk information on Plone conference website.
If your website is slow or is down, you lose money. How do you solve this. Some say: throw servers/money at the problem. Yes, this can work in the short term, but mo' money, mo' servers, mo' problems.
Optimize: make it faster: your Python, javascript, css, html. At some point you hit a limit.
Use caching. That is what we will talk about. Basic idea: why would you recompute if the data has not changed.
I am tech evangelist at Varnish software. I have a colleague here at the conference who speaks Portuguese, and a German colleague who speaks Spanish. So talk to them as well if you want.
New book by me: Varnish 6 by example.
The visitor of your site first gets sent to Varnish, and this either sends the request on to Plone, or serves a previous answer from its cache. That is basically how it works.
Everything that is static is easy to cache. We listen to any cache headers that are available in the request, or set by Plone in the response.
But by default, if the request has a cookie, we do not cache. But these days every website uses cookies! So we just have to deal with them properly.
You do this using the VCL language. This is a domain-specific language for Varnish only. It is not a top-down language, it builds on other settings, and you can write extra routines for this.
Let's talk about caching in problem. Disclaimer: I am not a Plone expert.
I have setup a Plone backend and Volto frontend locally with varnish, in Docker containers.
In the cache settings:
- I see good things in here, you can enable caching.
- Very good: you can set a url for purge requests.
- A bit strange: you need to select which content types should be purged, I selected them all.
- For various sorts of contents and files you can choose strong caching, moderate, weak or terse, which sets all kinds of cache headers, even some e-tags, which is cool.
E-tags are sent on again by the browser if it requests this again. Plone then checks this e-tag and if nothing has changed, it sends back a 304 Not Modified response.
What if we could store the e-tag somewhere, then you could return very fast maybe even without contacting the framework. Maybe this happens already.
Some of the rules use a Vary header, but I wonder if the Accept header is the best here. Also, watch out for filling the cache when an attacker makes request after request with a different Accept header. In Varnish we have an `accept` module that can help here.
Cache purging: on a PURGE request we clear the cache for a url. You allow this from a white list of IP addresses. Plone purges a lot of paths, but maybe not enough. I think the `/Plone` is missing from some urls. And some urls with query parameters are not purged.
I have written a VCL for this, but not sure if that is the way to go. What about tag-based cache invalidation instead? You can do that in the VCL with `add_key`. You can also strip off query parameters.
Also, you can mark things as required, instead of actively purging it. Note that you can also serve stale content while Varnish gets a fresh answer from Plone.
Can we set the stale-while-revalidate keyword in Plone, for the Cache-Control header? That could help.
Question: should you use the varnish load balancer or a specialised load balancer like haproxy? Answer: I am biased of course, but if you are already using Varnish, just use that. We have some advanced options.
Plone is already very advanced here in how it does caching. But look at the Vary Accept header, and at tag-based caching.