Introduction to the ZODB

published Oct 10, 2007 , last modified Jan 30, 2008

Plone conference 2007, Naples. Speaker: Laurence Rowe

The ZODB stores objects, not rows of data. When dealing with rows like in a relational database and you change the data layout, you do alter table statements for migration. In the ZODB that does not happen so you need to do that differently.

Transactions

Zope 2.8 introduced transactions, which means:

concurrency control
atomicity
less conflict errors

When you do a lot of writes in the ZODB you will encounter conflict errors. Most of them should be resolved automatically. But if this happens a lot, you need to improve your code. In some cases BTrees help, so using BTreeFolders as base for your folders. Also a product like QueueCatalog can help. This defers new calls to the catalog, which can speed things up at the cost of not always seeing up-to-date data.

Sending emails might happen twice if the transaction gets aborted somewhere in between.

Scalability

Python has a Global Interpreter Lock, which means a python process can use just one CPU. Zope can use ZEO to handle this better. One ZEO server serves the data to several ZEO clients. A ZEO client is basically a normal Zope instance but with the database being managed by another process: the ZEO server.

You can also store for example the catalog in a different ZODB. You can tweak the caching of the different ZODBs to fit whatever you put in that particular database. A lot of big sites to this.

Storage types

Almost everyone uses FileStorage. This stores the database in a single file Data.fs. DirectoryStorage can be used in some cases. Newer on the block is PGStorage which stores the data in a PostgreSQL database.

Other features

Savepoints: you can go back to one state that you know is safe, instead of reverting the complete transaction.
You can store versions, but that is deprecated.
Undo, but this is not always possible. If you put for example your catalog in a different storage, you will find that an undo of the main database does not undo the change in that catalog.
BLOB: large file support (in newer ZODBs).
Packing: remove old versions of obejcts. Similar to doing a vacuum in PostgreSQL to reclaim used space. It gets rid of undo information.

Best practices

Do not write on read. If your object for example uses setDefault, you may find that the first time you read that object it actually gets written.
Keep your code on the file system, not in the ZODB, like the custom folder. That quickly becomes unmanageable.
Use more BTrees. This scales better than normal trees. BTrees handles thousands of objects more efficiently CPU-wise and memory-wise.
Make simple content types. Use adapters if you need more complexity on top of the base content types.

Documentation: http://wiki.zope.org/ZODB/Documentation It is a bit old though, more about earlier versions of the ZODB. But still a good source of information.

plone ploneconf2007