Andreas Jung: Migrating a large university site to Plone 5.2

published Oct 24, 2019

Talk by Andreas Jung at the Plone Conference 2019 in Ferrara. is a Plone 4.3 site of a large Belgian university. Started in 2002 as Zope/CMF site. 90.000 pages, sub sites, 40.000 students, hundreds of editors, 90 add-ons. They wanted to move to Plone 5.2 and Python 3.

  • Traditional in-place migration: too manu add-ons, no one-on-one mapping possible.
  • Transmogrifier: not yet Python 3 at the time, too much magic hidden in too many places with blueprints.
  • So: custom migration solution.

Content types: standard types, plus four custom content types, including PloneFormGen. So that is quite reasonable. There is extensive usage of archetypes schema extenders.

Start: analyze and investigate your dependencies. - Based on Archetypes? Obsolete, replace. - No longer needed? Remove it. In Confluence we compiled a big table with for each package the basic information of how we would handle it: upgrade it, replace it, unknown yet, status of Python support.

Start with a minimal Plone 5.2 setup. Add one verified Python 3 compatible add-on at a time. Test extensively. Focus on content types first. Things like portlets can be handled later.

You need an export. We used a customized version of collective.jsonify. Core numbers: 90.000 json files, 55 GB data, 90 minutes, binary files base64 encoded.

We exported portlet assignments, default pages, layout information, workflow state, local roles, and pre-computed values for further efficient processing.

So we had 90.000 json files on the file system. We imported this in ArangoDB. Why use such a migration database? This allowed us to import only some portal types, or do parallel imports, and test complex migration steps like for PloneFormGen.

We briefly tried MongoDB, but that could not handle data over 16 MB. The json could be dumped unchanged in ArangoDB. This took 45 minutes.

Now we need to import this into Plone. Clean Python 3.7, Plone 5.2, plus the minimal set of packages needed for the content types. Import via plone.restapi. On top we have a dedicated migration package with special views. This handled things like translating between UIDs and paths.

The "magic" migration script is based on configuration in YAML.

  • Phase 1: pre-check the migration, remove target site if it already exists from previous test, create new Plone Site, install add-ons.
  • Phase 2: create all Folders, query ArangoDB for this.
  • Phase 3: create all non folders.
  • Phase 4: global actions. Check and migrate paths to UIDs in rich text fields. Assign portlets. Other specific fixup operations, like reindexing.

We migrated PloneFormGen (Archetypes) to easyform (dexterity). Export: one JSON for the FormFolder, one JSON file per field and action adapter. In easyform this needed to be turned into an EasyForm instance and a schema.

Topics (AT) to Collections (dexterity). Code was largely taken over from the migration.

From AT schema extenders to dexterity behaviors. First we made a list of which there were, are they in use, what do they do? Check which dexterity replacements there are. Create new behaviors.

Migrate packages to Python 3. This is mostly covered by talks of Philipp Bauer and David Glick. Common problems: utf-8 versus unicode, import fixes, implements to implementer. I rarely used the 2to3 and modernizr tools.

Some reimplementations: - portal skins to browser views - some packages with AT replaced with new packages with dexterity

Other problems:

  • improper file and image metadata
  • migration of vocabulary values, like old to new departments
  • repetitive cycles: always a bug occurs after a day of migration right before the end.

Quality control: - you need to check that migrated content and configuration is complete - "works for me" is nice, but others need to check too

Most of the packages have been removed, the setup is much smaller.

Status: - content migration is complete - must be tested in detail - integrate with the new theme, test this - need a replacement of a specific membrane usage - need work on castle/cas plugin

Takeaways: - Export Plone to JSOJN: 2 hours. Fast. - Import JSON to ArangoDB: 45 minutes. Fast. - Import ArangoDB to plone.restapi: 36-48 hours. Painfully slow. - 1.5 - 2.0 seconds per content object on average - cannot parellellize this import, because you would get conflict errors - So Plone and ZODB and painfully slow for creating mass content.

Question: in a similar setup we did a live migration from a live site to a separate new site. Did you consider that?

Answer: I tried this for other sites, but here I wanted to be able to partial imports, independent of the live site.

Question: default migration patches away all kinds of expensive indexing. You might want to consider looking at that. Migrating ten items per second is possible, although that is inline migration. And can you share the code, especially for the easyform migration?

Answer: could be done, but is not the primary focus of the budget currently.

A slightly older version of the code is on