Weblog

published Nov 03, 2021, last modified Nov 04, 2021

Introduction to the ZODB

published Oct 10, 2007, last modified Jan 30, 2008

Plone conference 2007, Naples. Speaker: Laurence Rowe

Plone conference 2007, Naples. Speaker: Laurence Rowe

The ZODB stores objects, not rows of data. When dealing with rows like in a relational database and you change the data layout, you do alter table statements for migration. In the ZODB that does not happen so you need to do that differently.

Transactions

Zope 2.8 introduced transactions, which means:

  • concurrency control
  • atomicity
  • less conflict errors

When you do a lot of writes in the ZODB you will encounter conflict errors. Most of them should be resolved automatically. But if this happens a lot, you need to improve your code. In some cases BTrees help, so using BTreeFolders as base for your folders. Also a product like QueueCatalog can help. This defers new calls to the catalog, which can speed things up at the cost of not always seeing up-to-date data.

Sending emails might happen twice if the transaction gets aborted somewhere in between.

Scalability

Python has a Global Interpreter Lock, which means a python process can use just one CPU. Zope can use ZEO to handle this better. One ZEO server serves the data to several ZEO clients. A ZEO client is basically a normal Zope instance but with the database being managed by another process: the ZEO server.

You can also store for example the catalog in a different ZODB. You can tweak the caching of the different ZODBs to fit whatever you put in that particular database. A lot of big sites to this.

Storage types

Almost everyone uses FileStorage. This stores the database in a single file Data.fs. DirectoryStorage can be used in some cases. Newer on the block is PGStorage which stores the data in a PostgreSQL database.

Other features

  • Savepoints: you can go back to one state that you know is safe, instead of reverting the complete transaction.
  • You can store versions, but that is deprecated.
  • Undo, but this is not always possible. If you put for example your catalog in a different storage, you will find that an undo of the main database does not undo the change in that catalog.
  • BLOB: large file support (in newer ZODBs).
  • Packing: remove old versions of obejcts. Similar to doing a vacuum in PostgreSQL to reclaim used space. It gets rid of undo information.

Best practices

  • Do not write on read. If your object for example uses setDefault, you may find that the first time you read that object it actually gets written.
  • Keep your code on the file system, not in the ZODB, like the custom folder. That quickly becomes unmanageable.
  • Use more BTrees. This scales better than normal trees. BTrees handles thousands of objects more efficiently CPU-wise and memory-wise.
  • Make simple content types. Use adapters if you need more complexity on top of the base content types.

Documentation: http://wiki.zope.org/ZODB/Documentation It is a bit old though, more about earlier versions of the ZODB. But still a good source of information.

Why you should learn more about workflow in Plone

published Oct 10, 2007, last modified Jan 30, 2008

Plone conference 2007, Napels. Speaker: Vincenzo Di Somma Company: RefLab

Plone conference 2007, Naples. Speaker: Vincenzo Di Somma Company: RefLab

Workflow from the beginning. This is mostly a hands-on talk, showing the portal_workflow in the ZMI.

Workflow is basically: during the life cycle of content there are changes in who is allowed to do what.

In Plone we use the workflow to manage the publication process. But we can use it for other things.

Plone 3.0 by default has six workflows now that you can choose from.

About four people in the audience have created workflows in the ZMI. About three have used ArchGenXML for this.

In the ZMI you can manage workflow variables. Usually: action, actor, comments, review_history, time. But you can add more yourself.

You can add worklists. By default there is the reviewer_queue. This is used by the Recent Items portlet to get a list of items that are in the pending workflow state.

On the Permissions tab you can list permissions that should be handled by this workflow: who is allowed to do what in which state.

Files and Images have no workflow anymore in Plone 3.0 by default. Question from the audience: why do they not have a on-state workflow? Answer: there are differences, but I would have to look up the code. Probably there are differences in what Anonymous visitors are allowed to see.

There is also the CMFPlacefulWorkflow, which you can install with the quick installer. [About three in the audience have worked with that.] It is very powerful. It lets you specify workflow policies. In there you can set a different default workflow and assign different workflows to content types. Within a folder object you can specify which policy to use. So you can use a different policy in different parts of your site.

You can change workflows through the ZMI or with a python file or with the newer, better way: with a xml file in a GenericSetup profile. In portal_setup you can export your current workflow to an xml file, because you do not want to create such an xml file from scratch.

Use case: Smanettona. Site for learning children about computers. Content that is added should be checked by an adult first. We use default Plone publishing workflow for the corporate part of the site. We use a restricted workflow for the home folder of the users. And an even more restricted workflow for the magazine.

You can manage more than just the publication process. You can send emails when a transition occurs. You can have worklists, like the reviewer queue.

Common requested improvements by customers on top of workflows:

  • email alerts on workflow change (look at content rules in Plone 3.0)
  • explicit selection of actors (like reviewers) by group, sections, etc.
  • multireview: transition needs to be done by more than one person.
  • link transitions: publishing A also automatically publishes B.
  • multi site life cycle: once content is published on site A it is automatically published on site B.

The story of GetPaid and a "social source" process to create new opportunities with Plone

published Oct 10, 2007, last modified Jan 30, 2008

Plone conference 2007, Naples. Speaker: Christopher Johnson

Plone conference 2007, Naples. Speaker: Christopher Johnson

This talk is about how Plone GetPaid is being developed. We made a name for how we did that: Social Source.

Developers are often not too happy about input from others. "We know quite well what we need to do, thank you."

How you get it done determines what you get done.

We saw that Plone was flexible and useful out of the box. But payment through Plone was not possible, so you had to use other products. We wanted to do something about that.

I am an entrepreneur, not a developer, and I did not know much about e-commerce when I started.

Step 1. So we started looking: what is already out there? Why do we need something else. We wrote about that on the website of GetPaid: http://plonegetpaid.com.

Step 2. Then we made a plan. What do you do and how? Who is going to benefit? And make the site pretty.

Step 3. We recruited leaders and participants.

Step 4. We refined the requirements. We got input from users, developers, UI experts.

Step 5. We asked for money. If you do not ask, you will not get it. How to ask? Connect needs with value: how does it help them? Be transparent about what you do with the money. Be patient and persistent.

Step 6. We celebrate successes. Reward and recognize people. Communicate about recent developments in the project: what has been done?

Step 7. Sustain it: have fun together, motivate people.

We had three sprints so far. The code is at Google Code. We have a blog, mailing list, irc channel #getpaid.

Perhaps Plone could benefit from a process like this? Where is Plone going? What is our vision? Developers and users should answer that question. [Someone from the audience says: Alexander Limi is going to announce something about that this week.]

How to minimize CPU and memory usage of Zope and Plone applications

published Oct 10, 2007, last modified Jan 30, 2008

Plone conference 2007, Naples. Speaker: Gael Le Mignot Company: Pilot Systems

Plone conference 2007, Naples. Speaker: Gael Le Mignot Company: Pilot Systems

Why this talk? Plone is powerful but slow. Us developers tend to forget about optimization. "Computers are powerful, right?" Yes, but Plone can be slow, so we need to look at that. You can do several things.

  • Use fast algorithms, especially for code that is called often. In a test algorithm going from list to set speeds things up by a factor of 2700.
  • Trust the base python code, instead of rolling your own functions.
  • Compute once, use often. (So use plone.memoize for instance.)

How do you know how much time your code takes?

  • Write a decorator printing the time taken by its function execution.
  • On Unix systems, use resource.getrusage
    • Should you look at system or user time? user time is the time spent in your python/zope code.
    • Unix does not give you per-thread accounting. So you could use just one thread during testing. But that steers you away from real life.
  • Due to caching etcetera a second test will probably give different results.
    • Fresh load strategy: test on a freshly started zope
    • Second test strategy: run the test twice and throw away the first result.
  • DeadlockDebugger: meant for debugging deadlocks. But you can also use it for profiling.
  • While running the benchmarks, do not surf the web or otherwise use your computer.

Write efficient Zope code:

  • Use the catalog, but do not store too much in it, else it gets too big.
  • Page templates and python scripts are slow.
  • Content types, tools and python code in general are faster.
  • Use caching with decorators for slow methods. Do not store zope objects in the cache, but simple python types. Think about what you put in the cache key. Roles? User ids?
  • Think about expiry of the cache, otherwise your memory will run full. Options:
    • keep it for a limited time
    • store a maximum number of objects
    • use LRU (or scoring)
    • restart Zope
    • Have sme button you can push to flush the cache.

Pilot Systems is releasing GenericCache, a GPLed pure python cache, so you may want to look at that.

Conflict errors:

  • With threads and transactions, nothing is written to the ZODB until the end. So when your transaction takes too long, another transaction may commit earlier, which causes you code to run again, slowing you down even more.
  • Try not to change too many objects in the same transaction.
  • But look out for resulting consistency errors. Look for a good point to commit the transaction which will not leave your data in an inconsistent state.
  • Some conflicts can be resolved. For instance: for the most recent access date you can just pick the most recent. Use the _p_resolveConflict method for those cases. It takes old state, saved state and current state as input. It is up to you to use it.

Memory freeing may kick in too late, wasting memory that you know can be freed. This is made difficult with several layers on top of each other: python garbage collector, C libraries, operating system.

Swapping memory: the size is no real problem. The problem is the frequency with which things are fetched from disk to memory.

  • Do I have a memory problem? Use vmstat 2 on the command line to look at your virtual memory state. Interesting columns in its output: si and so on Linux, pi` and po on FreeBSD.
  • Use the gc (garbage collection) module of python. Get a list of objects, check for uncollectable cycles, ask for manual collection.
  • Monkey patch malloc. Track malloc/free calls with ltrace. In the GNU C library write hooks to malloc.

Optimizing memory:

  • Use the catalog, like mentioned already. It is better for your memory than awakening objects.
  • Store big files in the filesystem. This will avoid polluting the ZODB cache. Even better: use Apache to serve them directly. Use tramline. The BlobFile directory can be served by Apache.
  • You can use del to manually remove large objects before running some slow code.

Massive site deployments:

  • Use ZEO. Easy to set up. Makes Zope parallellizable on many CPUs which makes Zope faster.
  • But: it will slow ZODB access, especially when doing it over a network. Also, the ZODB is on one server, which can become a bottle neck.
  • Use a proxy-cache like squid. This works best for anonymous users. Caching for logged-in users is more difficult.
  • Squid varying: cache based on e.g. the language that the browser includes in the request.
  • Create a second ZODB on a different server by exporting the site as static html or with a zexp or just copying the ZODB. Syncing back is tricky, but we did it with PloneGossip and SignableEvent.

Conclusion: Plone is not doomed to be slow! But optimization has to be part of all steps, from design to code. Fast CPUs and large memory does not solve this problem.

i18n, locales and Plone 3.0

published Sep 25, 2007, last modified Mar 05, 2013

More and more products are developed as python packages instead of Zope products. This means they should not be put in the Products directory anymore, but in the lib/python/ directory of your instance. How you handle translations has changed a bit because of that. Also with Zope 2.10 and higher the translation machinery has changed a bit. So how should you handle translations now?

For the most recent version focusing on Plone 4, see my talk at the Plone Conference 2012.

For a version from 2010 focusing on Plone 3.3 and 4, see a different article on my blog.

How should you handle translations now when dealing with locales, i18n, products, packages and Plone 3.0?

For clarity: I use the following terms here: a product is in the Products dir, a package is in the lib/python dir.

Please correct me if anything I write here is wrong.

locales

Most translations should now be put in the locales directory of your product or package. This directory must be registered in zcml before it is picked up on Zope startup. So in your configure.zcml add:



    


The locales directory differs a bit from the i18n directory. i18n has for example:

i18n/mydomain.pot
i18n/mydomain-nl.po
i18n/mydomain-de.po

In locales that should be:

locales/mydomain.pot
locales/nl/LC_MESSAGES/mydomain.po
locales/de/LC_MESSAGES/mydomain.po

You can rebuild the .pot file with:

i18ndude rebuild-pot --pot locales/mydomain.pot --create mydomain .

and you sync the .po files with:

i18ndude sync --pot locales/mydomain.pot locales/*/LC_MESSAGES/mydomain.po

Domain plone

If you want to add translations to the plone domain you could add locales/plone.pot and locales/nl/LC_MESSAGES/plone.po (or locales/plone-mydomain.pot and locales/nl/LC_MESSAGES/plone-mydomain.po if you want). That works: your translations for the plone domain are then available. But now the default translations for the plone domain (so those from PloneTranslations) are overridden. This is because your translations for the plone domain are picked up first by the zope translation machinery; the PlacelessTranslationService that normally loads the translations from PloneTranslations then gets ignored for the plone domain. So half your site is in English even though your browser is set to Dutch.

So extra translations for the plone domain should not be done in the locales directory. You can still put them in the i18n directory though. In Plone 3.5 this is likely to change.

As Hanno Schlichting tells me, the same is of course true for the other domains in PloneTranslations, like atcontenttypes, cmfeditions, plonelanguagetool, etcetera.

i18n

If you put an i18n dir in a package, it will be ignored. Doing i18n:registerTranslations for this directory does not work. You can only use an i18n dir in a product.

GenericSetup profiles

Several .xml files in the profiles can handle i18n as well. For instance in a types definition like types/JobPerformanceInterview.xml:



The i18n:domain should be plone and not for instance mydomain as these translations are used in templates of plone itself. In fact, I think that the only use for having these i18n commands in the .xml files is that they can then be extracted by i18ndude (version 3.0 I think, which is best installed in a workingenv).

Compiling .po files

.po files in i18n are compiled on zope startup time by the PlacelessTranslationService. With compiling I mean: turning them into .mo files so they are usable by zope. This automatic compiling does not happen .po files in packages. So it is better to compile those files yourself. You can do that like this:

# Compile po files
for lang in $(find locales -mindepth 1 -maxdepth 1 -type d); do
    if test -d $lang/LC_MESSAGES; then
        msgfmt -o $lang/LC_MESSAGES/${PRODUCTNAME}.mo $lang/LC_MESSAGES/${PRODUCTNAME}.po
    fi
done

Conclusions

  • For both products and packages: put the translations for your domain in the locales directory.
  • Put extra translations for the plone domain in an i18n directory in a product for the Products dir.