Weblog
Jim Fulton - The ZODB
Talk by Jim Fulton at the Plone Conference 2016 in Boston.
See slides at http://j1m.me/plone16
Paul Everitt introduces the talk: The ZODB is still amazing after twenty years. Hierarchical object database including permissions, NoSQL, lots of things. On to Jim.
I am working one hundred percent on ZODB currently. Previously for Zope Corporation I could focus only part of the time on it, solving some problems we were having. Zope Corporation no longer exists. I was contracted by ZeroDB, who made this possible. ZeroDB had two products. Database that stores data encrypted at rest. Big-data analysis with hadoop. They decided to focus on their Hadoop-based product for now. I plan to offer ZODB support, consultancy, so get in contact if you need me.
Are any people here using ZODB based on NEO? No. NEO is doing some interesting things for highly durable storage. I bit more effort to setup. Poll: about half the people on the room use RelStorage, all use ZEO, a few use ZRS. I really recommend you to look at ZRS if you use ZEO. ZRS (Zope Replication Services) 1 was a nightmare, but version 2 is very good. We never made backups with repozo, we just replicated it.
ZEO version 4 used asyncore, by far the oldest async library in Python. It has lots of issues and is deprecated. I had a suspicion that maybe asyncore made ZEO slower. I rewrote most of ZEO to use asyncio instead, and cleaned the code up. In most cases there is performance improvement.
The ZODB API is synchronous. I have been using async libraries since say 1996. The API could change. Shane added a cool hack to ZServer to avoid waking up the event loop, which is a big performance win.
Transactions should be short. The longer the transaction, the higher the chance of a conflict. Connections are expensive resources, they take memory. If you have long-running work, try doing this asynchronously. But handing this off reliably is tricky.
Consider using content-aware load balancers, so you don't need all data in memory on all servers. They working set may not even fit in memory.
Might ZODB run with Javascript in a browser? Run ZODB in a web worker. Provide an async API to your UI code. This assumes that ZODB has been ported to Javascript, which should actually be doable. If someone wants to pay me for it... :-)
A challenge for some applications, is to get objects loaded fast, especially on startup. (You can often mitigate this using a ZEO client cache.) There were some problems with persistent caches, but they have been stable for a few years. But you can now prefetch items. You tell ZODB to prefetch some items, and then you can forget about the request and ZODB will meanwhile prefetch it for you, so it may be available later when you really need it. So the items are loaded asynchronously.
ZEO now has SSL. ZEO had authentication, but it made the code harder to understand. It is now out in favor of SSL. So you can restrict access to the ZODB.
ZeroDB stored the data encrypted, which meant the server could not do conflict resolution. So I added conflict resolution on the client. You can then work with real objects instead of just state. Solving conflicts in BTree splits would be easier then. It reduces processing time on the server. I would like to move conflict resolution up to the ZODB, instead of having it in ZEO.
Object-level locks. Currently ZEO locks the database for writes during the second phase of the commit process. In that phase it needs to wait for the clients to maybe do conflict resolution. Object-level locks could help here. I got it working, but it mostly did not give a performance win.
ZODB on the server is actually faster with PyPy.
ZeroDB did some interesting experiments. Split a database into multiple virtual databases, one per user, separate invalidations.
Unification of RelStorage, NEO, ZEO. NEO had some patches for ZODB and they are now merged, like a simpler implementation of multi version concurrency control. This is better for RelStorage as well. RelStorage is no longer a special case, and it has a new maintainer in Jason Maddon.
Inconsistency between ZEO clients. Scenario: add an object in one zeoclient, next request goes to second zeoclient and it potentially does not have the object yet during a very short timespan. There now is a new server-sync option to force a server round trip before each transaction. That is a cost, but maybe it should be the default.
What have I been doing after my work for ZeroDB. I worked on decent documentation, which lagged behind a long time. See http://zodb.org. You can help me improve it, by writing documentation, or also definitely by bugging me about documentation that you are missing.
FileStorage2. FileStorage worked out much better than I ever imagined. The main code has probably not changed in twenty years. It is a bit slow. With FileStorage2 we have better, separate packing, external garbage collection needed though, but that is better. Unneeded features are removed: versions and back-pointers. It uses multiple files, so with a pack you can split a file, write newly incoming transactions to the new part and pack the old part.
Byteserver is an alternative ZEO server implementation, written in the Rust language. Rust is very fast, faster than Go mostly. No Global Interpreter Lock like Python has. Byteserver includes a FileStorage2 implementation, new API between server and storage, built for speed rather than pluggability. Initial tests, from this morning, are promising, twice as fast as ZEO.
We used Zookeeper a lot, which helps keep track of which server are live and which have disappeared.
Future ZODB ideas:
- more speed. I don't need speed to be the reason people use ZODB, but it should not be a barrier.
- more documentation
- OO conflict resolution
- The ability to subscribe to object updates.
- Integration with external indexes like Elastic Search, Solr. ZRS could be used for this: look at that stream of data and push the relevant parts to the external index.
- Persistent pandas data frames
- A 'jsonic' API, to be able to look at the data without having the classes. There are some zodb browsers already.
- ZRS auto fail-over. At Zope Corp we probably only had one or two unexpected fail-overs in all those years.
- Official Docker images would be good. But if that uses Python 3 then your client also needs to be Python 3.
- ZEO authorization.
- Persistent classes?
- Other languages? Javascript, Ruby, Scala.
David Bain - From Zero to Plone: Towards Faster Developer Onboarding
Talk by David Bain at the Plone Conference 2016 in Boston.
I am from Jamaica and am enjoying the weather here!
We are going to talk about the pain, the background, and the goal of developer onboarding.
Being approachable is a good thing. Plone can be hard to approach. For end-users it is fine, but programming it can be tough. It's more like learning to operate a helicopter than like learning to ride a bike.
Can we make getting started with Plone easier for new developers? If we do this, it is going to add a lot of value for a lot of persons. To be fair, the barrier is getting lower.
I had someone who knew how to theme Wordpress sites. She needed to learn how to theme Plone. The documentation was lacking. I started thinking about what she needed, and started creating documentation.
Funny experience: I searched for 'Plone newbie' and the top results were from my own website.
MIT has an onboarding program for new employees, all kinds of things for the first day, first week, first month. This is very different to sink-or-swim. It requires understanding to some extend, looking at your blind spots.
So what is developer onboarding? Turning an outsider into an insider. Preparing developers to be effective with the stack by equipping them with an understanding of our culture, tools, skills and processes. Are you an insider or an outsider? Even after a few years you can be hesitant to say you are an insider.
There is a gang of four people who do research on developer onboarding: Fagerholm, Sanchez Guinea, Borenstein, Münch. They researched at Facebook. Some findings:
- Mentors are less productive during mentoring.
- Mentored employees became productive much quicker than employees who were not mentored.
- Recommended: get core developers to mentor. They can communicate the model of development, give overview.
What can you do?
- provide checklists, for example steps in a readme
- simplify the setup, automate things
- have lists of ready-made blocks of code, snippets, conventions
Make the development cycle clear:
- setup development environment
- customize/code/test
- deploy to production
People don't want to work or think more than they have to. If a hello-world exercise takes three hours, you don't get motivated. Have good examples instead of descriptions. Progressive disclosure: tell new users only what they need to know, and tell more later, so they don't get overloaded in the beginning.
My newbie-friendly picks:
- Plock, for quickly getting a Plone Site up and running
- mr.bob for creating an add-on skeleton
- training.plone.org for quick-start tutorials
- Gloss as layer on top of Diazo
- Diazo snippets library
What is the least amount of vocabulary for newbies to get things done? Do they really need to know what buildout is, and Diazo, or can they be productive without it.
Getting the model wrong can cost a lot of time. Let's help newbies.
Rapido can help for newbies too, fairly maintainable, I like that.
Raggam, Klein: Fixing Plone 5, the Framework
Talk by Johannes Raggam and Jens Klein at the Plone Conference 2016 in Boston.
This is a story about an early adoption of Plone 5.0. We are from the Blue Dynamics Alliance.
Plone 5 helped us to meet the requirements of a big project. But we ran into problems, which we fixed upstream in Plone core. At the pure CMS level the first Plone 5.0 release was okay, but lots of bugs were left, although there were no blockers.
The new theme was a big advantage for our projects, responsive, much easier to begin development with.
For Porsche Informatik Salzburg we did mostly training, though also a few minor core fixes, consulting. The internal developers knew Zope 2 already, so they could use some of their previous knowledge. We used Mosaic, which meant they only needed to create tiles, not viewlets and portlets.
Project for Architecturstiftung Österreich. We used Lineage, to handle eleven sub sites. Very image centric site. Mobile friendliness was important. We updated several add-ons to support Plone 5.
For Swiss Bankers association we did a project together with Peter Holzer. It had to be secure, modern, responsive, and since it was Switzerland multilingual support was very important. We had to find a way to let collections show German documents and as fallback show the English document. We created collective.linguatags for multilingual tags: tag in one language which gets translated to other languages.
Free software is part of our business model. We try to give back to the core, instead of stacking things on top. Plone is open source driven by the community, not a single company. That is better for the companies, although it makes it more difficult to make money with it.
We made around 200 core pull requests for these three projects. Here are some of them, most are merged, some not:
- Add review state as a class in content items in the portal tabs.
- Improve the new resource registry. Simpler generation of gruntfile, compile resources. Make this generator more verbose. Better formatting of generated file. Fix compile errors. Fix plone-legacy RequireJS errors in development mode.
- Prettified the toolbar. Added a less variable for the width of the secondary toolbar column.
- Reworked the related items widget in mockup. You can search and browse, instead of only one of the two. Still, linking to three or four items works fine, but for more it gets difficult.
- The structure pattern: design for alerts, better action items.
- Use official TinyMCE bower dependency. Documented in core TinyMCE how this is built, so we can use this well in Plone. You can include multiple styles now.
- Mosaic fixes. The editor needs work, though it got better during the Leipzig sprint. Take permissions of visibility of tiles into account. Adapterize the layout (ILayoutAware). This gives a base to remove the main template if you want to. Fixed a MemoryError due to sub requests. This also makes Diazo faster.
Did various code cleanup, pep8, sort imports, etcetera. Makes it easier for newbies to get in.
Fixes for Lineage. Fixes with path setting of widgets, like related items widget, also important for multilingual. Subject vocabularies are now aware of navigation root (sub site).
Enabled unload protection in various packages. Various pattern fixes. Fixed MemoryError during scaling. Added events in the transform chain.
Lessons learned:
- Don't underestimate dot zero releases. It was an experience.
- Better fix things upstream instead of patching.
- Avoid forking repositories, better create a branch on the original repo.
Want to help?
- Please report and/or fix issues. Improve documentation, edit or add it. Dare to ask on http://community.plone.org. Enhance the add-ons, help them to Plone 5.
- PLIP: PLone Improvement Proposals. This is the process for getting bigger features into Plone.
- Enhance the ecosystem around Plone. Help to get Plone on Python 3. Code cleanup.
Sean Upton - A Pinch of Indirection
Talk by Sean Upton at the Plone Conference 2016 in Boston.
I am from the University of Utah Department of Pediatrics, http://upiq.org. I will be giving practical tips for using component architectures. This is my pragmatic perspective on components, my opinion. We will explore idioms and ideas.
Plone has been using the ZCA (Zope Component Architecture) for most of the past fifteen years. It is an old hat, but it fits great, we don't want another.
In Plone we push to make things simpler. An API helps here. But you should still strive to understand the underlying component architecture, otherwise lots of things won't make sense. We have complexity, which may be hard to debug, but it makes great things possible.
I encountered the ZCA for the first time in a discussion on OSCON in 2002 with Jim Fulton. Zope 2 at that time had lots of difficult code, look at all the mixin classes in for example the OFS package. We still have that today. But the ZCA allows a different approach. Instead of one big package, there are more packages, which is a trade-off.
Holy trinity of components:
- adapters: single context. Or views with multiple contexts (object and request). Or subscribers.
- resources: content, utilities, request/response
- schemas
Objects can be in two categories, those that you lookup and those that you don't. Lookup by path, component registry or both. Others are created on the fly, like site global state.
Indirection. If you want to override a widget with a custom widget and the override is not working, then it can be tricky to see why this is. It can be a scary departure from imperative programming, where things may be clearer at first, but the override may not even be possible in such a system.
Interfaces are contracts. Multiple implementations can provide this interface.
Make components that look like native data structures. Programmers are used to it, so let them make use of their experience, don't work against it. We use a resource-centric development, driven by state, not by action. We do object publishing, so keep the object central. A procedural API, like plone.api, will often try to hide the ZCA: just ask the site for something, which then makes the single site central. You don't notice that under the hood plone.api is using the ZCA. But this abstraction leaks through.
I can and do use plone.api. But I also wrote my own, more state-driven api. I can mix these.
My point: don't think that an API needs to hide the ZCA, that the ZCA is only for ninjas. Food analogy: sure, use a kitchen appliance, but please still remember how to use a knife, without cutting yourself.
Components are "objects connected by interfaces." (Jim Fulton) Resources are easier to work with when they are smaller and pluggable.
Use interfaces as schemas. For content items, you only want to add fields in this interface, and not methods: content should be simple. Then create adapters for adding functionality.
For debugging adapter lookup: temporarily move _zope_interface_coptimizations.so out of the zope.interface distributions. Restarts, take coffee, and debug. But this is not often needed.
Views. Other frameworks see views as endpoints for urls. We see views as all about content, using a multiadapter on the content and the request for it, which is terrific.
Some vocabularies may be static, but others need to be specific for a context, so we use the ZCA for that. Very useful.
For site-global utilities there are four ways:
- persistent component (utility). So this stores state in the ZODB. Uninstalling can be a nightmare, so watch out.
- adapter of a site. Needs to be cheap to construct, because you will construct this every time code needs it. You may still need to store state somewhere, like in an annotation or the configuration registry.
- plone.api in certain cases
- CMF tools (deprecated).
Related:
- Reg, from Martijn Faassen, inspired by zope.interface and zope.component
- Python's ABC, abstract base classes. But they are not really interfaces.
Timo Stollenwerk: REST API
Talk by Timo Stollenwerk at the Plone Conference 2016 in Boston.
I really like Python. I always miss its elegancy when I write Javascript. But Javascript is a reality. It runs on all browsers. If you are a web developer, you need to know Javascript and its frameworks. Things are moving fast.
Plone development happens slow in comparison. Also we have Plone 5, but some universities are still at Plone 3. [My site is Plone 2.5, Maurits.]
plone.restapi is our bridge between Plone and the Javascript framework world. It is not opinionated: you can use Angular or React or tomorrow's framework.
You can login using the REST api and get a token back that you can use in subsequent requests. You can create/get/edit/delete content information, navigation, breadcrumbs, get or set registry settings, handle users. A unified search api will be tricky: making it usable for both default catalog, solr and elasticsearch is basically impossible.
plone.rest is stable. plone.restapi is still in the alpha phase, but a lot of companies are already using it. We are really open to contributions.
Questions? Doesn't the front-end need access to some configuration information that we currently consider private? Yes, that is a problem we run into and are thinking about.