Weblog
Daniel Jowett: Journeys with Transmogrifier & friends
Or: how not to get stuck in the Plone dark ages. Talk by Daniel Jowett at the Plone Conference 2014 in Bristol.
I have been doing software since 1997. Background in C, Java, SQL, etc. Using Plone since 2008. Came to Plone conference in Bristol in 2010 and everyone seemed to be cleverer than me. Four years later, I know that they are cleverer than me. But I have done four or five projects with transmogrifier and jsonmigrator.
Transmogrifier comes from a Calvin and Hobbes comic. Look it up. A cardboard box becomes a transmogrifier: it can turn you into anything you like: dinosaur, giant slug, whatever.
There are a few variants.
- Plone transmogrifier uses two stages: export/import from/to Plone site via xml files or csv files.
- Funnelweb: crawls and parses static sites for import, which are then pushed into Plone.
- jsonmigrator: 1 stage process that crawls json views of an old site.
- Blueprints also exist that can for example read from SQL databases.
We will focus on jsonmigrator.
When to use jsonmigrator
- Good for migrating from old versions of Plone where you cannot install transmogrifier for exporting the contents.
- Even from old Zope/CMF sites.
- Particularly when you have no buildout.
- When changing from archetypes to dexterity.
- To clean cruft from an error-prone old site.
- When scared to upgrade because you don't know what might bite you later. You don't know what you don't know...
Note: this is not the whole story: you still need to update your Plone version, add-ons, custom code, theme.
When NOT to use transmogrifier
- From Plone 4.x.
- Probably not from 3.3.6 where you already have buildout: just use the standard Plone upgrade path.
Technology stack
- plone.app.transmogrifier + collective.transmogrifier as base
- collective.jsonify
- collective.jsonmigrator
- extra pipelines:
- transmogrify.dexterity
- quintagroup.transmogrifier (I had some issues with it, but it has useful stuff, so you may need it.)
Setting up jsonmigrator
- Duplicate your old Plone in your staging environment. Or use your public live site, but that is not recommended: it may have security issues, or you may run into corner cases where your site breaks. So do it on a copy please.
- Install collective.jsonify there. If you don't have buildout, you
may need to do it like this:
- Download the egg to your Plone directory
- Unzip it
- Add it to your Python with setup.py install, which will also pull in simplejson. Use a virtualenv if possible.
- Add some external methods in an Extensions directory, where you import the main jsonify functions. Add and use the methods in the ZMI of your old Plone Site. For example the get_item function, which you can then add to the end of a url to get the json version of content item. For a file you get the data included in base64 encoding.
- Add collective.jsonmigrator to the eggs of your new Plone Site.
- Go to the @@jsonmigrator view. Select a pipeline, update any settings, and run it.
In our case we had setup a really long pipeline to migrate from a Zope 2.6 site to Plone 4.
You can define your own pipeline sections, usually based on collective.transmogrifier.sections.manipulator to tweak data in the dictionary of the current item or .inserter to add data. Start with the collective.jsonmigrator.remotesource section for getting the data.
You may have to get your hands dirty when things don't exactly work the first time. Adding some print statements can already help to get the problem clear.
Caveats
- It does not export/import users, though see collective.blueprint.usersandgroups.
- Doesn't do portlets.
- collective.jsonify is not security safe to use at a live site, as mentioned.
Credits
- Transmogrifier: Martijn Pieters, Jarn.
- Jsonmigrator: Rok Garbas
- RCS: letting me loose on this
- Calvin and Hobbes: Bill Watterson, building the first migrator.
Warning from Steve: I have run into problems with character sets. Paul: there are encoding and decoding blueprints that help, though it takes some getting.
Maarten: You are getting json data from an old site that is still running. Can't you save the json data on disk and use that? Erico: yes, you can, that is what I usually do. Maarten: I am using a blueprint for that, also because sometimes the export gave an error and sometimes the import.
Watch the video of this talk.
Eric Steele: The state of Plone 5
Keynote by Eric Steele at the Plone Conference 2014 in Bristol.
What is happening with Plone 5? This year there have been a lot of sprints, sometimes even with thirty or forty people at the same time.
What is Plone doing to make my life easier?
Currently, Plone 5 is alpha software, so use with care.
Plone 5 has Diazo enabled by default. This means we have made the underlying html simpler, an unthemed basis with only a small bit of styling.
We have a toolbar, pulling out as much of the CMS as possible, separating it from the non-cms part of the page, making it harder to break your editing with a wrong Diazo rule.
New Barceloneta theme. Theme customization with new resource registries, partly a return to the old base_properties, but done better, through a control panel in Plone.
Date formatting is now handled on the client side: i18n, local time.
There will be a panel with an overview of security settings, where you can for example choose between standard and high security, depending on your needs, changing various settings at once.
Our setup has changed so it has become easier to keep up to date with third-party javascript add-ons. So we now have TinyMCE 4 as editor.
Various widgets are updated.
The folder contents tab has been updated, based on the wildcard.foldercontents package.
Dexterity has been developed for years, stable for quite some time. Far less boiler plate than Archetypes. Add behaviors to content types. All the core types have been ported over to Dexterity, in plone.app.contenttypes. We finally have recurring events in Plone. Archetypes will still work in Plone 5, and ATContentTypes is still there, but new sites will use dexterity. There is migration, which you can do type by type.
Add-on development: we advice you to use plone.api. Easier to remember where to import from. See documentation on http://api.plone.org.
Javascript development: we use Mockup patterns. See http://plone.github.io/mockup
For that to work, we needed new resource registries for css and javascript, so they are there, in the Plone UI.
Settings should all be stored in the configuration registry, which is done for the Plone control panels.
Security: automatic csrf protection, automatic clickjack protection, automatic keyring rotation.
We have documentation, including automated screen shots.
Try it out: http://plone.org/try-five
What's next? Do you continue using Zope as it is? Do we absorb some of that code? Do we switch to Pyramid. That is part of the ongoing Plone 2020 roadmap discussion. Various techniques like Diazo and plone.api separate you from the backend, making it a bit easier to switch things around there. An event like this Plone conference is where the talking happens, where decisions are made, not in some closed off secret session of core programmers. Just join us and feel welcome.
Watch the video of this talk.
Python Users Netherlands meeting 22 October 2014
Summary of the meeting of the Dutch Python Users group on 22 October 2014.
Thanks to Byte for organizing the meetup, and the beer, pizza, and beer opener for everyone. There were two tracks for the talks, so these are not all the talks. Go ask someone else. :-)
Folkje welcomed us at Byte, a web hosting company. We develop tools that help developers. We focus on quality. Cluster and Magenta hosting. Currently working on Hypernode.
Job van Achterberg: Python, Parallelism and Concurrency
A talk about threads, locks, processes and events.
CPU manufacturers have gone to multi-core. So a CPU one year from now may not actually be much faster than now. So: you may need to be smarter when programming. Is your software designed to do many things at the same time? Not one monkey doing many things, but many monkeys each doing their own thing.
You can do multi processing with the multiprocessing module.
Threads: one thread is still doing one thing at the time. That is the threading module.
Green threads, gevent, greenlet library. Basically: threads in user space. They still use one actual thread, it just looks like there are more.
asyncore library, a socket wrapper for an asynchronous event loop
Python 3 has asyncio (sometimes known as tulip).
There are difficulties with multithreading. You do not know when the scheduler of the OS will handle which thread. So you have to use locks when you want to do something threadsafe.
- threading.Lock is the default lock. Other threads that want the same lock will be waiting.
- threading.Rlock is a re-entrant lock: one thread can use the same lock more than once, as long as it releases it as much as it acquires.
- threading.Semaphore is a lock with a counter: four locks, if one gets released, a waiting thread can acquire the free one.
In Python you can use queues to share state between threads: see the Queue module. You can wait on a Queue, instead of checking it every second.
But the GIL (Global Interpreter Lock) holds us back. The more resources, the slower we get, because threads get a few CPU ticks and then Python checks the next thread. The overhead of many threads bites you in the ass.
You can use multiprocessing and move the stuff over multiple cpu cores. Each process has its own GIL. The processes still need to talk to each other, maybe via a database. That is still overhead, but may be faster in your use case.
Consider threads or async (event loop) when using lots of I/O.
Consider multiprocessing when using lots of CPU.
Luuk van der Velden: AngularJS and Python; hybrid vs REST
This is a conceptual talk, with discussion. I am interested in what you think about this.
Choice: thin versus thick client. Python MVC entails a thin client. Javascript MVC is a thick client, becoming more popular every day.
REST in this talk is: a Javascript MVC on the client talking to an API on the server, which is using Python. So they communicate via a strict protocol.
Hybrid here is: a Javascript MVC on the client talking to a Python MVC. Python can be a rich server like Django, already pumping out rich html.
REST: stateless interface, decoupled client and server, strict API. For example Flask, a 'batteries excluded' framework, with Flask-restful, an extension to facilitate REST api development. Class based views only handling standard HTTP requests like get and put. Flask-restful parses the request and decides if it is valid. It may send a Bad Request message back to the user.
So what is Angular (or AngularJS)? It is built and hyped by Google, loved by many. Every one uses it, so everyone uses it... A toolset for building a front-end framework. An html compiler. Elegant asynchronous request handling. Automagic page updates through data binding. You can let it get a (REST) resource from the server and on success bind the data.
Hybrid has been done. For example http://django-angular.readthedocs.org: a collection of utilities to ease integration of Angular and Django.
Example: I have created a Tinder-like app to like members of metal bands: http://hologramearth.com/metalfest Really hybrid: I am mixing Python and Angular code in the template there.
Discussion points:
- Hybrid could make your application more DRY.
- Initialization can be done on back-end models.
- REST-ful resources where they are needed.
- Back-end updates might run through the whole stack.
- Is everything REST-ful?
- No clear separation of tasks (server, client).
Audience:
- Be careful to not need say 25 asynchronous calls when you could easily prepare it all at once on the back-end.
- Hybrid is useful at the beginning of your project. You can grow from there.
- Combining Jinja templates with Angular, which use the same brackets, is possible as you have shown, but I would choose a different templating language.
- Separation of concerns is important if you have multiple teams/persons, one handling the Javascript front-end, one the styling, one the Python back-end.
https://github.com/Lvelden/PyMeetupTalk
And let's connect on LinkedIn, I am new there.
Mohammad Noureldin: JumpScale and Fabric
These are two members of the Automation and Configuration Management family. You probably know Fabric already.
I like open source and building communities around open source projects.
JumpScale used to be known as PyLabs. Cloud service platform: you can build a cloud service with it. Go to a web interface, connect machines there, just a model, save it, edit it, machines are there. Easy. Q-Layer (taken over meanwhile) and mothership1.com have been built with this.
You have an agent controller and agents. Using vagrant in a demo. There is a command jpackage that knows how to install packages on various Linux systems.
Each JumpScript on the controller is written in Python. It will run scripts on the agents. A script could start a long running process and then you can monitor it on the server, or via a web UI. But you can come up with your own use cases.
Fabric is shell scripting in a Pythonic way. Less cryptic than Ansible. The code can be run on multiple hosts (and can include localhost).
Create a file, fabfile.py is a good default name, and add functions in there. You can call those with fab function1 function2. This will then be executed on the hosts that you have defined. Or override those on the command line.
There is support for running it parallel on the hosts, but I have not tried it.
If you like one of these projects, make sure you join the communities.
Python Users Netherlands meeting 21 June 2014
Summary of the meeting of the Dutch Python Users group on 21 June 2014.
Thanks to Reinout van Rees at Nelen & Schuurmans in Utrecht for organizing the meetup.
Tikitu de Jager - async, coroutines, event loops, etc
I work at http://buzzcapture.com
You have an infinite loop with blocking I/O. You do not want that. It could mean you need to work with threads, but you can use coroutines with asyncio. Two infinite loops at the same time, both potentially blocking, both in the same thread. And it all works.
Blocking I/O is like waiting for coffee.
Non-blocking I/O is: place your order, go somewhere else, pick it up when you get called. So: Starbucks.
So: instead of waiting, get out of the way.
This is handled by an event loop. Constantly checking: did anything happen? When anything happens, do I have a callback for it? Compare waiting for a click event in javascript. Be careful not to run into callback hell, otherwise known as callback spaghetti.
A coroutine is a routine that can pause and resume its execution. In python it does yield.
Now you want to be able to yield to the event loop.
You register a subscriber to the event loop. In this subscriber you indicate that you want to yield from somewhere: get some data from somewhere else when an event happens.
The event loop will need to have an API for adding a coroutine to the waiting list and a running list.
Let's compare some aproaches.
Twisted wants to make it easy for you to use networking protocols, with similar ideas. Twisted has been around for a long time, so it uses some different names for similar notions. Porting existing blocking code to Twisted is not always easy.
asyncio has similar high level protocols, but also intends to provide a base layer for other libraries. It uses yield from, which is syntax from Python 3. There is a backport for Python 2 (Tulip), which does not use this syntax.
Remember that you can send() a value to a generator. Also: throw and close. Look those up in the documentation! See also islice for taking a slice of a generator.
When you yield from a generator you hide a loop:
for value in generator: yield value
With gevent you monkey patch I/O functions:
from gevent import monkey; monkey.patch_all()
gevent uses greenlets, which are full coroutines. Pretty cool. It is a near drop-in in synchronous code. It needs a C extension, so it works only for CPython. Something similar exists for PyPy.
There is also node.js, but I am not going to tell you about that.
You can still use normal callbacks of course.
Audience question: how do tracebacks look with this? Not terrible, but still slightly evil.
See slides.
Reinout van Rees - Python for flood simulation
Reinout works at Nelen & Schuurmans. Demonstration video of a simulation of a flooding in Cape Town. Running in browser, connected with web socket to the server.
Lots of big data. For the Netherlands there is data for the height of every 50 bij 50 centimeter of the country. We use Mercurial to store data.
Calculation core is written in Fortran... We use a Python ctypes wrapper.
Web interface is a Django site with angular.js interface. We have Tornado wrapped around a Django view as a socket interface. The map layers are done with GDAL and Flask. See another talk this evening.
So: Python is great! You can do a lot with it.
Boaz Leskes - python and elasticsearch
Elastic search is a distributed search engine on top of Lucene. Language independent. Json. Can use it for indexing, getting data.
There is a zipfile, you need Java, and then it runs.
Now talk to it with Python:
pip install elasticsearch
There is also a higher level client. See: https://github.com/elasticsearch/elasticsearch-dsl-py
Arjan Verkerk - lightning test
I work here. :-)
You can run a single test file test.py like this:
python -m unittest test
But can I just edit my test file and have the tests run, without me having to alt-tab to go to a different console and run the command again? That is where entr comes in, Event Notify Test Runner:
ls *py | entr -c python -m unittest test
Arjan Verkerk - GDAL raster magic
No, I do not know how to pronounce GDAL properly. ;-)
Tools for georeferenced raster data. python-gdal is the python implementation. Raster data is a raster of images, like those 50 by 50 centimeters of height data of the Netherlands (AHN2).
You need to know your coordinate system. 'Rijksdriehoekstelsel' is good for data in the Netherlands. geotransform can transform coordinates from one system to another.
You can use gdal to store sparse data efficiently. You may have a large area with no data and a small part with actual data.
Demo with AHN2 data.
We use Flask to serve the data.
On the javascript side: leaflets.
Wichert Akkerman - a new simple rest framework
I made REST-toolkit: reinventing the wheel again.
from rest_toolkit import quick_serve, resource
Json responses for all errors. CORS headers. CORS OPTIONS response.
Build on pyramid, so full access to the Pyramid toolset.
Concepts:
- Every url matches a resource.
- Multiple resources can map to the same stored data, but have different permissions and views, say one for anonymous and one for admin.
- Actions are not resources, but handled via a controller, like a remote procedure call, for example to reboot a server. That is not pure REST: not everything is pure data.
SQLAlchemy integration:
from rest_toolkit.ext.sql import SQLResource
A couple of hundred lines of code currently. Coming soon:
- standard views to view/update/delete resources using standard form schema, like json, wtform, Colander, etc.
- Tutorial for AngularJS and other frameworks.
See:
Dennis Kaarsemaker - whelk (makes subprocess easier)
Pretending python is a shell. Sometimes you just want to call a command on the command line and talk to that with python. Can be ugly to write with subprocess, especially when you want to pipe commands together.
Sample code [typos guaranteed, Maurits]:
from whelk import shell, pipe shell.git(...) shell['2to3']... shell.make('test', output_callback=..., run_callback=..., exit_callback=... raise_on_error=True)
Easy redirection, per-line logging, encodings.
git = shell.git git.checkout('next') if not git.diff('--quiet', 'master', raise_on_error=True): ...
pip install whelk
Marcel van den Elst - complete RESTful stack from pyramid and mongoengine to backbone
Example: ZooStack for managing RESTful animals front to back.
At http://progressiveplanning.com we use Pyramid, we came from Django and TastyPIE. We transformed it into something that MongoDB could use.
A RESTful stack from front to back.
- Flexible, long-lived MVC client application framework
- well defined, related and privilege-checked documents
- secure RESTful API producer and consumer
MongoEngine is nearly dead, do not use it. So we are going to have to fix some things ourselves. Document definitions.
TastyMongo is a RESTful API for MongoEngine, a la Tastypie.
Backbone-Relational. Useful when an action triggers changes in multiple models on the client and the server. You can try to do it manually if you want, but Backbone-Relational takes care of it for you. See http://backbonerelational.org
Jeff Knupp: Keynote: Writing idiomatic Python
Jeff Knupp gives the keynote at Pygrunn: Writing idiomatic Python.
See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.
Towards comprehensible and maintainable code.
I am author of the Writing Idiomatic Python book.
Idiomatic Python is Pythonic code: code written in the way that the Python community has agreed it should be written.
Who decides this? Python developers, through the code that they write. Occasionally: Guido van Rossum.
The goals are: readability, maintainability and correctness.
"But I am a scientist, why should I write idiomatic code?" You want to be peer reviewed, right? If your code is eye-bleeding bad, you will get peer ignored.
"But I just write scripts." You program in Python.
"But I am coming from Java. Python programmers can still read all my class names that all have Factory in it, right?" Wrong.
Cognitive burden is the increased mental effort required to keep track of what is going on. Don't make me think.
Knuth said that our main task is not instructing computers, but explaining humans what a computer does.
Idiomatic code makes your intentions clear. It does not automatically make it correct. It let's others spot mistakes more easily.
Part of it, is staying up to date with changes in the language. Is there are better way to do it, with a new construct from the standard library or an extra package?
You are the most frequent reader of code you write. Have mercy on your future self. He may be a violent psychopath coming back to slap you in the past.