How to minimize CPU and memory usage of Zope and Plone applications
Plone conference 2007, Naples. Speaker: Gael Le Mignot Company: Pilot Systems
Plone conference 2007, Naples. Speaker: Gael Le Mignot Company: Pilot Systems
Why this talk? Plone is powerful but slow. Us developers tend to forget about optimization. "Computers are powerful, right?" Yes, but Plone can be slow, so we need to look at that. You can do several things.
- Use fast algorithms, especially for code that is called often. In a test algorithm going from list to set speeds things up by a factor of 2700.
- Trust the base python code, instead of rolling your own functions.
- Compute once, use often. (So use plone.memoize for instance.)
How do you know how much time your code takes?
- Write a decorator printing the time taken by its function execution.
- On Unix systems, use resource.getrusage
- Should you look at system or user time? user time is the time spent in your python/zope code.
- Unix does not give you per-thread accounting. So you could use just one thread during testing. But that steers you away from real life.
- Due to caching etcetera a second test will probably give different
results.
- Fresh load strategy: test on a freshly started zope
- Second test strategy: run the test twice and throw away the first result.
- DeadlockDebugger: meant for debugging deadlocks. But you can also use it for profiling.
- While running the benchmarks, do not surf the web or otherwise use your computer.
Write efficient Zope code:
- Use the catalog, but do not store too much in it, else it gets too big.
- Page templates and python scripts are slow.
- Content types, tools and python code in general are faster.
- Use caching with decorators for slow methods. Do not store zope objects in the cache, but simple python types. Think about what you put in the cache key. Roles? User ids?
- Think about expiry of the cache, otherwise your memory will run
full. Options:
- keep it for a limited time
- store a maximum number of objects
- use LRU (or scoring)
- restart Zope
- Have sme button you can push to flush the cache.
Pilot Systems is releasing GenericCache, a GPLed pure python cache, so you may want to look at that.
Conflict errors:
- With threads and transactions, nothing is written to the ZODB until the end. So when your transaction takes too long, another transaction may commit earlier, which causes you code to run again, slowing you down even more.
- Try not to change too many objects in the same transaction.
- But look out for resulting consistency errors. Look for a good point to commit the transaction which will not leave your data in an inconsistent state.
- Some conflicts can be resolved. For instance: for the most recent access date you can just pick the most recent. Use the _p_resolveConflict method for those cases. It takes old state, saved state and current state as input. It is up to you to use it.
Memory freeing may kick in too late, wasting memory that you know can be freed. This is made difficult with several layers on top of each other: python garbage collector, C libraries, operating system.
Swapping memory: the size is no real problem. The problem is the frequency with which things are fetched from disk to memory.
- Do I have a memory problem? Use vmstat 2 on the command line to look at your virtual memory state. Interesting columns in its output: si and so on Linux, pi` and po on FreeBSD.
- Use the gc (garbage collection) module of python. Get a list of objects, check for uncollectable cycles, ask for manual collection.
- Monkey patch malloc. Track malloc/free calls with ltrace. In the GNU C library write hooks to malloc.
Optimizing memory:
- Use the catalog, like mentioned already. It is better for your memory than awakening objects.
- Store big files in the filesystem. This will avoid polluting the ZODB cache. Even better: use Apache to serve them directly. Use tramline. The BlobFile directory can be served by Apache.
- You can use del to manually remove large objects before running some slow code.
Massive site deployments:
- Use ZEO. Easy to set up. Makes Zope parallellizable on many CPUs which makes Zope faster.
- But: it will slow ZODB access, especially when doing it over a network. Also, the ZODB is on one server, which can become a bottle neck.
- Use a proxy-cache like squid. This works best for anonymous users. Caching for logged-in users is more difficult.
- Squid varying: cache based on e.g. the language that the browser includes in the request.
- Create a second ZODB on a different server by exporting the site as static html or with a zexp or just copying the ZODB. Syncing back is tricky, but we did it with PloneGossip and SignableEvent.
Conclusion: Plone is not doomed to be slow! But optimization has to be part of all steps, from design to code. Fast CPUs and large memory does not solve this problem.