Weblog

published Nov 03, 2021, last modified Nov 04, 2021

Paolo Perrotta: A Deep Learning Adventure

published Oct 25, 2019

Keynote talk by Paolo Perrotta at the Plone Conference 2019 in Ferrara.

It is hard to see what part of machine learning is over hyped, and what part is actually useful.

Basis of ML (Machine Learning) is to get input and transform it to output. The ML gets a picture of a duck, and it gives as answer "duck". That is image recognition. You train the ML with images that you already label. And then you give it an image, and hope it gives a good answer. Same: from English to Japanese text. From an fMRI scan to an image visualized.

Simpler example: solar panel. Input: time of day, output: how much generated power. You would train this with data. The ML would turn this into a function that gives an approximately good answer.

Simplest model: linear regression. Turn the data into a function like this:

a * X + b

No line through the training points will be perfect. You find the line that minimizes the average error. So the ML uses linear regression to find a and b.

In our example, we try to guess the amount of mojitos we sell, based on the number of passengers that a daily boat brings to our beach bar.

But this may also depend on the temperature, or the number of sharks. With two variables you would not have a line, but a plane. With n, you would get an n-dimensional shape. You have n inputs, and give a weight to each input, and add them.

This is about numbers. For image recognition, you don't get a number as output. But we can apply a function to the result, and get a number between 0 and 1 that gives a likelyhood. For an image we may have 0.02 certainty that it is a cat, and 0.9 that it is a duck, so our answer will be the highest: a duck. Translated to the mojitos: what is the likelyhood that my bar breaks even?

This system with weights and a reducer function is called a perceptron. With a short Python program I got 90 percent accuracy on the standard NDIST test. We can do better.

We look at neural networks. This is basically: mash two perceptrons together. Or more. We are finding a function that approximates the data, and reduce the error iteratively.

What would I say is deep learning? Why has it grown?

  1. Neural networks with many layers...
  2. ... trained with a lot of data...
  3. ... with specialized architectures.

We helped Facebook with neural networks by tagging our photos. We gave them training data!

A lot of engineering is going into deep learning.

Generative Adversarial Networks: GANs. Example: a horse discriminator. Train the system with images of horses and others, and it should answer with: yes or no horse.

Other part: horse generator. Randomly generate images for feeding to the horse discriminator. You train this by saying: you did or did not manage to trick the discriminator. And you try to get better. I trained this a night, and after about a million iterations, I got pictures that are not quite horses, there is something wrong, but they are amazingly close.

Lightning talks Thursday

published Oct 24, 2019

The Thursday Lightning talks at the Plone Conference 2019 in Ferrara.

Michele Finelli: cappellacci

My second of three easy pieces on Ferrara.

Now about cappellacci, or caplaz in the local dialect. No, it is not tortelloni or tortellini, please.

Pasta filled with pumpkin is a tradition of manu parts or Northern Italy, but easily butter and grated cheese. With vegetables? Not if you want to avoid getting arrested.

And remember: spaghetti a la bolognese does not exist.

Jens Klein: yafowil, declarative forms

Yet Another Form Widget Library. It is now a drop in replacement for z3c.form, by activating the yafowil behavior per portal type.

For example usage, we have an example package. And documentation.

Federico Campoli: Postgres carbonara

Spaggheti carbonara, the PostgreSQL way. Lots of SQL code and demo.

Code is in a gist.

Eric Brehault: PLIPs

PLIPs are PLone Improvement Proposals. There are no PLIPs at the moment. A lot is happening in Volto, but that is outside core.

If you submit, should you do the work yourself? No! You can, but it is not needed. Just give ideas to people who can develop it. Please do not hesitate.

There is a PLIP about the Dublin Core metadata behavior. This is a meta behavior for four others. This may change. If you have a concern about this, please add your thoughts to the PLIP.

Asko Soukka: robot tests

Robot framework is a way to automate tests. You can combine this with Jupyter notebooks. It can help you write better tests, including auto completion. See https://robots-from-jupyter.github.io/public/

Sven Strack and Erico Andrei: Jekyll and Hyde

How to rank Plone events. So many years, so many pictures. How do you compare pants and pools?

There can only be one answer. Food!

  • Bronze medal: Awesome Tokyo, Plone conference 2018.
  • Runner up: Somni Català, Barcelona Plone conference 2017.
  • Honorable mention: The Plone Cake, Boston Plone Conference 2016.
  • The winner is: Red Turtle Tiramisu, Ferrara Plone conference 2019.

Paul Grünewald: Digital Signage and Plone

[Note from Maurits: I expected this to go about digital signatures, but it is about showing signs on monitors, for example to inform your visitors.]

University Dresden. Content types for monitors, slide sets and slides. Contents: text, images, timetables, fullscreen video, ticker. Editing WYSIWYG inline using CKEditor, preview, scheduling

Code: tud.addons.monitor

Sven Strack: Docs analytics

Margot Bloomstein: "If we don't know if our documentation is successful, how will we know what we need to do to improve it?"

For the Plone docs we use an overwatch dashboard. Alerts when files on docs.plone.org have not been updated in a year. Open issues and PRs. Accessitility, performance, speed. Graphana, Prometeus, Matomo.

Christine Baumgartner and Ilvy: Alpine City Strategic Sprint 2020

Side note: German symposium "Plone Tagung" March next year in Dresden. See https://plonetagung.de/2020/

11 tot 14 februari in Innsbruck, Austria, come sprint with us (and enjoy beautiful and snowy Austria):

https://alpinecity.tirol

Rob Gietema: Volto form editor

I said there wasn't a form editor. But actually there is a schema editor. Saved to JSON schema. No backend implementation, but maybe we can sprint on this.

Panel: Ask Me Anything on Volto

published Oct 24, 2019

Panel to ask anything on Volto at the Plone Conference 2019 in Ferrara.

Is Volto compatible with Guillotina? Not 100 percent. Question is how we keep the APIs in sync. If people figure out a shared generic API that works with both Plone and Guillotina as backend, this can work.

How do you migrate? You can migrate from Plone 4.3 to Volto. We did that with transmogrifier. Biggest problem is moving composite pages, like cover, to the Volto blocks. In a post migration step, you can fix things up.

Why Semantic UI? Rob researched all the popular ones, concentrating on how easy it is to theme or override stuff. Semantic UI makes this easy. You derive from a theme, and only override some parts.

For add-ons, how do you keep the backend part in sync with the frontent part? You don't. You use Python packages in the backend and node packages in the frontend. There will probably be a lot of packages that are only in the backend or only in the frontend. You may need to be careful in the frontend so that it can work with several versions of the backend of this add-on.

TypeScript has won the JavaScript wars. Will you support that? We don't want to support two ways of doing the same thing. So if we switch to TypeScript, we want to stop using ES6, also in the documentation. Same for class based versus function based.

Did you check accessibility? Yes. We have a static code checker ALM that helped us fix issues. Also the Cyprus tool. But we also do manual checks. Plone Foundation is trying to get funding for an audit.

Maik Derstappen: State of Plone back-end development today

published Oct 24, 2019

Talk by Maik Derstappen at the Plone Conference 2019 in Ferrara.

Frontend is nice, but it needs a backend, so let's talk about that. I will talk about plonecli, plone.api, Plone snippets for VS Code and plone.restapi.

You should use plonecli. It saves a lot of time for boring stuff. It helps you to create a product and enhance it step by step. It can cleanup and create a fresh virtualenv with a specific Python version and requirements and run buildout.

It has sub templates, for example to add a content type to your package. Now a sub template for a restapi service. Also one for upgrade steps, adding an upgrade profile per upgrade step.

We create a structure and files, and if you don't use some parts you can ignore or remove them. We will check if you have a clean git status before, and ask to commit, and afterwards we automatically commit.

You have good test coverage right from the start. All features added by plonecli have at least basic test coverage. You only write the tests for your own code. We make a tox environment to test different Plone and Python versions.

You can configure plonecli/mr.bob to your taste, changing the default answers or ignoring questions in a .mrbob file.

It is extensible, you can write your own custom bobtemplate packages and register them for the plonecli.

Visual Studio Code: snippets for Plone

There are snippets for schemas, fields, registry xml.

Use plone.api. It makes add-on code much easier to understand, without arcane incantations.

Ideas for the future.

  • plonecli:
    • add option to set Interface for view, viewlet.
    • REST API sub templates for de/serializer.
    • Graphical UI (ncurses-like) to make selecting options easier.
  • VS Code: plone.api autocompletion. VS Code is not so smart with namespaces. And it does not know buildout. You can use a buildout recipe that helps here though. Or use the generated zopepy script as your Python.

I would love more contributions. Don't be shy. There is no grand jury who makes decisions. Publish your own bobtemplates. Improve the VS Code snippets, or for other editors. Meet me at the sprints.

Question: is there a bobtemplate to create a new bobtemplate?

Answer: No. Should be possible.

Andreas Jung: Migrating a large university site to Plone 5.2

published Oct 24, 2019

Talk by Andreas Jung at the Plone Conference 2019 in Ferrara.

https://ugent.be is a Plone 4.3 site of a large Belgian university. Started in 2002 as Zope/CMF site. 90.000 pages, sub sites, 40.000 students, hundreds of editors, 90 add-ons. They wanted to move to Plone 5.2 and Python 3.

  • Traditional in-place migration: too manu add-ons, no one-on-one mapping possible.
  • Transmogrifier: not yet Python 3 at the time, too much magic hidden in too many places with blueprints.
  • So: custom migration solution.

Content types: standard types, plus four custom content types, including PloneFormGen. So that is quite reasonable. There is extensive usage of archetypes schema extenders.

Start: analyze and investigate your dependencies. - Based on Archetypes? Obsolete, replace. - No longer needed? Remove it. In Confluence we compiled a big table with for each package the basic information of how we would handle it: upgrade it, replace it, unknown yet, status of Python support.

Start with a minimal Plone 5.2 setup. Add one verified Python 3 compatible add-on at a time. Test extensively. Focus on content types first. Things like portlets can be handled later.

You need an export. We used a customized version of collective.jsonify. Core numbers: 90.000 json files, 55 GB data, 90 minutes, binary files base64 encoded.

We exported portlet assignments, default pages, layout information, workflow state, local roles, and pre-computed values for further efficient processing.

So we had 90.000 json files on the file system. We imported this in ArangoDB. Why use such a migration database? This allowed us to import only some portal types, or do parallel imports, and test complex migration steps like for PloneFormGen.

We briefly tried MongoDB, but that could not handle data over 16 MB. The json could be dumped unchanged in ArangoDB. This took 45 minutes.

Now we need to import this into Plone. Clean Python 3.7, Plone 5.2, plus the minimal set of packages needed for the content types. Import via plone.restapi. On top we have a dedicated migration package with special views. This handled things like translating between UIDs and paths.

The "magic" migration script is based on configuration in YAML.

  • Phase 1: pre-check the migration, remove target site if it already exists from previous test, create new Plone Site, install add-ons.
  • Phase 2: create all Folders, query ArangoDB for this.
  • Phase 3: create all non folders.
  • Phase 4: global actions. Check and migrate paths to UIDs in rich text fields. Assign portlets. Other specific fixup operations, like reindexing.

We migrated PloneFormGen (Archetypes) to easyform (dexterity). Export: one JSON for the FormFolder, one JSON file per field and action adapter. In easyform this needed to be turned into an EasyForm instance and a schema.

Topics (AT) to Collections (dexterity). Code was largely taken over from the plone.app.contenttypes migration.

From AT schema extenders to dexterity behaviors. First we made a list of which there were, are they in use, what do they do? Check which dexterity replacements there are. Create new behaviors.

Migrate packages to Python 3. This is mostly covered by talks of Philipp Bauer and David Glick. Common problems: utf-8 versus unicode, import fixes, implements to implementer. I rarely used the 2to3 and modernizr tools.

Some reimplementations: - portal skins to browser views - some packages with AT replaced with new packages with dexterity

Other problems:

  • improper file and image metadata
  • migration of vocabulary values, like old to new departments
  • repetitive cycles: always a bug occurs after a day of migration right before the end.

Quality control: - you need to check that migrated content and configuration is complete - "works for me" is nice, but others need to check too

Most of the packages have been removed, the setup is much smaller.

Status: - content migration is complete - must be tested in detail - integrate with the new theme, test this - need a replacement of a specific membrane usage - need work on castle/cas plugin

Takeaways: - Export Plone to JSOJN: 2 hours. Fast. - Import JSON to ArangoDB: 45 minutes. Fast. - Import ArangoDB to plone.restapi: 36-48 hours. Painfully slow. - 1.5 - 2.0 seconds per content object on average - cannot parellellize this import, because you would get conflict errors - So Plone and ZODB and painfully slow for creating mass content.

Question: in a similar setup we did a live migration from a live site to a separate new site. Did you consider that?

Answer: I tried this for other sites, but here I wanted to be able to partial imports, independent of the live site.

Question: default migration patches away all kinds of expensive indexing. You might want to consider looking at that. Migrating ten items per second is possible, although that is inline migration. And can you share the code, especially for the easyform migration?

Answer: could be done, but is not the primary focus of the budget currently.

A slightly older version of the code is on community.plone.org.