Weblog
Alin Voinea - Docker and Plone
Talk by Alin Voinea at the Plone Conference 2016 in Boston.
Docker is what virtualenv is for Python. Isolated environment. You have the same environment on Linux/Mac/Windows and development/test/production.
For Plone without Docker you need to install all kinds of libraries, run a buildout, ignore some SyntaxErrors.
For Plone with Docker you can run:
docker run -p 8080:8080 plone
This uses the new official Plone Docker image at https://hub.docker.com/_/plone See information on that page, also on including add-ons or developing. In that case buildout gets run so it can fetch the add-ons.
Always label your Docker volumes to avoid data loss.
You can use a zeoserver:
docker run --name=zeo plone zeoserver
In production you should create your own Docker images, including all add-ons you need. You can use extended_buildout.cfg. For composing your own Dockers, see https://docs.docker.com/compose
When using multiple hosts, you want orchestration. There are several options. We use Rancher. You can use the official Plone Docker image and set several options.
The plan is to use this for continuous delivery, like in the talk by Nejc Zupan.
See various repositories with Docker for the EEA.
See https://github.com/plone/plone.docker for the source of the Plone Docker image.
Nejc Zupan - Learn How We Deliver. Continuously.
Talk by Nejc Zupan at the Plone Conference 2016 in Boston.
Discussions in the hallway are the best part of this conference. You don't have slides there, so I won't use them here either. I do have them by the way.
Have you deployed anything to production this month? Week? Today? If code only sits on your local computer, it does not give value to your customer.
What would deploying look like if it was easy? You fix a big, push the change, the tests are run, they pass, production gets updated automatically or maybe after one push of the button. Instead of saying 'the fix has been committed and will be in the next release', you want to say 'it is fixed'.
The bigger the change, the harder it is to deploy. One fix, with one data migration, is fast to deploy, and easy to undo if needed. With three fixes: it gets harder. For one fix, do you really need a developer, a QA person, a project manager, a sysadmin?
When you push to a git repository, you can hook up actions to start testing and deploying.
Tools are interchangeable. You are smart people, you can figure out how to do this. More important is why you should do this.
When a pull request is made, we copy production data to staging. This may be a lot of data, so in case of SQL database we only copy a subset. The pull request is deployed to staging, so a reviewer can immediately check it there. We have 100 percent test coverage. We have regression tests. For new employees, I want them to deploy something to production the first day, going through the whole process. This is very empowering. It keeps developers happy.
Code ownership. If you know that your pull request may be deployed within the next twenty minutes, then you will be careful about what you put in the pull request.
Tip: if you don't have this yet, then start doing this with a very small project.
Give your customers access to the staging machine that has this change, so they can check it.
We host everything on Heroku. For the highest paying levels, you have a button to revert the code change and the database change. Heroku does this well. If you are not in the business of hosting, then don't do hosting.
We have about ten people in the company. We do have a few small Plone Sites.
For us, the only way to get code into the master branch, is via pull requests.
We spend more money on the staging servers than on production.
You can automate stuff like: if a new tag of a package is generated, push it to your internal PyPI.
What about big features instead of bug fixes? You can use feature flagging, showing for example a new UI to only a few people. But anyway, it is rolled out to staging, you test it, if it works you can deploy it. For big features it can take longer to get everything correct, so you just don't merge the pull request yet.
I will never write code without writing a test. For an emergency fix, it is still better to spend twenty minutes to write a test, instead of spending two hours later reverting and debugging.
Annette Lewis - The super integrator
Annette Lewis gives the second keynote talk at the Plone Conference 2016 in Boston.
Disclaimer: I do not call myself a super integrator, but others do, and I humbly accept it.
Three years ago I had not heard about Plone, though I had used various other CMS packages. Now I work at Penn State university doing websites, particularly Plone websites, moving static content into Plone. When I started, there were 120 websites, and other a few months my only other colleague left. We got other people into the team. We support 23 departments, 10 programs, 16 centers and institutes, and more. Not all websites are Plone, but about forty to sixty are.
I was showing a site to Eric Steele, and he looked like: you do this through the web? Yes, I do. Theming, Diazo, works fine.
I learned through trial and error. It builds character... and eats up time.
If it exists, don't recreate it. Take an existing solution and improve on it.
I am using Diazo to it's fullest. Using xpath and xsl to keep original classes and ids, so that an editor who is looking at a page source can still see that a part of the page is for example a portlet, so that he knows where to start editing it.
Some stuff I need to do in page templates.
If I need a similar block three times, then I make it reusable.
Portal types. We were using News Items for slider images. That is hard for editors to remember, they think: I want an image, so I can't use a news item. For those cases I simply copy the News Item type and give it a name that makes sense to them, like Home Images.
I talked to Cris Ewing last year about how I did all this through the web, and I saw his enthusiasm, so I knew: I am not crazy to do everything through the web.
I want my end users to feel empowered. It is their website.
Questions?
What would make my life as integrator better? Plone 5 actually solves several things. It would help with caching of our custom css, and defining the colors for css in the resource registries. Debugging diazo could maybe be easier, saying clearer what the error is. Diazo snippets would be nice. David: there is a diazo snippets library, available as chrome extension.
If your Diazo rules are slow, make your rules more specific.
Our policy is to train the users. If they don't get training, they don't get an account. And I say: if you mess things up, like accidentally removing an entire folder, don't try to fix it yourself, but immediately call us. Not an email, call us. We have good relations with our users.
Lightning talks Wednesday
Lightning talks on Wednesday at the Plone Conference 2016 in Boston.
Sven Strack: Docs 2.0
Upcoming changes and improvements to the docs. For Plone we have the papyrus buildout to create the documentation, pulling docs in from several places, if the buildout works. We check if robot test screen shots are working, which means we usually start fixing robot tests. We review the html, old-school using our eyes. We have docs for Plone 3, 4, 5. With ssh update the server. It is boring and time consuming.
New setup. Micro services. CI tests, we will send mails to remind people that stuff needs to be fixed. We don't depend on buildout anymore for the docs. It will be dockerized. Really fast. Updating with zero downtime, users will not notice it except that they suddenly get served by the new docker instance. We are getting a new search, for example only searching in Plone 5. New theme.
Maurits van Rees: experimental.nodtml
I did not make notes during my own talk. ;-) Just read https://pypi.python.org/pypi/experimental.nodtml
Alexander Loechel: Future of Plone, RestrictedPython
Future depends on many things, for example moving to Python 3. Main blocker is RestrictedPython. But I am working on it. Almost fifty percent of the tests are currently running. So: making progress. We will get there. Thank you also to Michael Howitz, Stephan Hof, Thomas Lotze, Gocept, Hanno Schlichting, Tres Seaver.
Hector Velarde: Accelerated Mobile Pages in Plone
We made collective.behavior.amp, an AMP HTML transformer. Special tag in html head. Uses a Google service for speeding up pages on mobile.
Paul Everitt: $20k Floppy and a $100k Perl
Some stories.
By the way, DTML was replaced by ZPT because of DreamWeaver.
When your database was corrupted, back in the day, you emailed it to Jim and he would fix it.
Anyway, Zope used to come on a 3.5 inch floppy, including Python.
You used to be able to add Perl script in the ZMI. This cost 100k dollar to build.
Chrissy Wainwright: Products.Poi for Plone 5
We rewrote Product.Poi for Plone 5 and dexterity. No migration yet.
Some new features:
- Add/remove multiple attachments with drag-and-drop.
- Export as csv.
You can help out, for example with migration or translations:
https://github.com/collective/Products.Poi/tree/3.0-development
Lennart Regebro: A different type of conference
Plone conferences usually are in cities with expensive hotels. PyconPL, Python conference in Poland, is in a hotel in the middle of nowhere. Beautiful view. We have been doing this for a few years. There may be an outdoors barbecue involved. What I like: people cannot run off, they have to socialize and party, or go to bed. We should have that for Plone too. I cannot organize it myself. I just want to say: there is another way to organize a conference. Also, PyconPL also has an English track. So come and listen or speak.
Tim Simkins: Content quality checks
I am from Penn State. An article on a site might be on multiple places. We needed content control on it. For example: at most sixty characters for a title. We created automatic checks and subscribers for it. You can see a list of all such issues.
Steve Piercy: My first contribution to Plone
Or: a misadventure into open source software. I explored the world of Python in 2011. Pylons, Pyramid. Nice: full test coverage. After about a year I got courage to make contributions to the documentations. Within the last year I have been moving the documentation from easy_install to pip, which helped a lot. We wrote tutorials. It was very easy to contribute.
And with Plone? I got warned that I had to sign a contributor agreement. For one typo fix. I started reading the agreement, it was just too hard to understand for non-lawyer. Can we make this easier? I am here to assist you if you need fresh eyes on your project.
David Bain: PyCon Jamaica
PyCon Jamaica is November 17 and 18 this year. And on Saturday the Python scavenger hunt race. It is at the Hope Zoo, where they have Pythons. Costs are 80 dollars. Accommodation is not too expensive. It is organised by PythonJamaica. You can help! You can sponsor us and possibly get a T-shirt.
This presentation, including links for the T-shirts: http://tinyurl.com/pyconjamaica2016
Jim Fulton - The ZODB
Talk by Jim Fulton at the Plone Conference 2016 in Boston.
See slides at http://j1m.me/plone16
Paul Everitt introduces the talk: The ZODB is still amazing after twenty years. Hierarchical object database including permissions, NoSQL, lots of things. On to Jim.
I am working one hundred percent on ZODB currently. Previously for Zope Corporation I could focus only part of the time on it, solving some problems we were having. Zope Corporation no longer exists. I was contracted by ZeroDB, who made this possible. ZeroDB had two products. Database that stores data encrypted at rest. Big-data analysis with hadoop. They decided to focus on their Hadoop-based product for now. I plan to offer ZODB support, consultancy, so get in contact if you need me.
Are any people here using ZODB based on NEO? No. NEO is doing some interesting things for highly durable storage. I bit more effort to setup. Poll: about half the people on the room use RelStorage, all use ZEO, a few use ZRS. I really recommend you to look at ZRS if you use ZEO. ZRS (Zope Replication Services) 1 was a nightmare, but version 2 is very good. We never made backups with repozo, we just replicated it.
ZEO version 4 used asyncore, by far the oldest async library in Python. It has lots of issues and is deprecated. I had a suspicion that maybe asyncore made ZEO slower. I rewrote most of ZEO to use asyncio instead, and cleaned the code up. In most cases there is performance improvement.
The ZODB API is synchronous. I have been using async libraries since say 1996. The API could change. Shane added a cool hack to ZServer to avoid waking up the event loop, which is a big performance win.
Transactions should be short. The longer the transaction, the higher the chance of a conflict. Connections are expensive resources, they take memory. If you have long-running work, try doing this asynchronously. But handing this off reliably is tricky.
Consider using content-aware load balancers, so you don't need all data in memory on all servers. They working set may not even fit in memory.
Might ZODB run with Javascript in a browser? Run ZODB in a web worker. Provide an async API to your UI code. This assumes that ZODB has been ported to Javascript, which should actually be doable. If someone wants to pay me for it... :-)
A challenge for some applications, is to get objects loaded fast, especially on startup. (You can often mitigate this using a ZEO client cache.) There were some problems with persistent caches, but they have been stable for a few years. But you can now prefetch items. You tell ZODB to prefetch some items, and then you can forget about the request and ZODB will meanwhile prefetch it for you, so it may be available later when you really need it. So the items are loaded asynchronously.
ZEO now has SSL. ZEO had authentication, but it made the code harder to understand. It is now out in favor of SSL. So you can restrict access to the ZODB.
ZeroDB stored the data encrypted, which meant the server could not do conflict resolution. So I added conflict resolution on the client. You can then work with real objects instead of just state. Solving conflicts in BTree splits would be easier then. It reduces processing time on the server. I would like to move conflict resolution up to the ZODB, instead of having it in ZEO.
Object-level locks. Currently ZEO locks the database for writes during the second phase of the commit process. In that phase it needs to wait for the clients to maybe do conflict resolution. Object-level locks could help here. I got it working, but it mostly did not give a performance win.
ZODB on the server is actually faster with PyPy.
ZeroDB did some interesting experiments. Split a database into multiple virtual databases, one per user, separate invalidations.
Unification of RelStorage, NEO, ZEO. NEO had some patches for ZODB and they are now merged, like a simpler implementation of multi version concurrency control. This is better for RelStorage as well. RelStorage is no longer a special case, and it has a new maintainer in Jason Maddon.
Inconsistency between ZEO clients. Scenario: add an object in one zeoclient, next request goes to second zeoclient and it potentially does not have the object yet during a very short timespan. There now is a new server-sync option to force a server round trip before each transaction. That is a cost, but maybe it should be the default.
What have I been doing after my work for ZeroDB. I worked on decent documentation, which lagged behind a long time. See http://zodb.org. You can help me improve it, by writing documentation, or also definitely by bugging me about documentation that you are missing.
FileStorage2. FileStorage worked out much better than I ever imagined. The main code has probably not changed in twenty years. It is a bit slow. With FileStorage2 we have better, separate packing, external garbage collection needed though, but that is better. Unneeded features are removed: versions and back-pointers. It uses multiple files, so with a pack you can split a file, write newly incoming transactions to the new part and pack the old part.
Byteserver is an alternative ZEO server implementation, written in the Rust language. Rust is very fast, faster than Go mostly. No Global Interpreter Lock like Python has. Byteserver includes a FileStorage2 implementation, new API between server and storage, built for speed rather than pluggability. Initial tests, from this morning, are promising, twice as fast as ZEO.
We used Zookeeper a lot, which helps keep track of which server are live and which have disappeared.
Future ZODB ideas:
- more speed. I don't need speed to be the reason people use ZODB, but it should not be a barrier.
- more documentation
- OO conflict resolution
- The ability to subscribe to object updates.
- Integration with external indexes like Elastic Search, Solr. ZRS could be used for this: look at that stream of data and push the relevant parts to the external index.
- Persistent pandas data frames
- A 'jsonic' API, to be able to look at the data without having the classes. There are some zodb browsers already.
- ZRS auto fail-over. At Zope Corp we probably only had one or two unexpected fail-overs in all those years.
- Official Docker images would be good. But if that uses Python 3 then your client also needs to be Python 3.
- ZEO authorization.
- Persistent classes?
- Other languages? Javascript, Ruby, Scala.