Weblog
Python Meetup 22 November 2017
Summary of the meeting of the Amsterdam Python Meetup Group on 22 November 2017.
The meeting was hosted by Byte. Thank you!
Byte is not really hosting anymore, moving to the cloud. We use lots of Python. Magento. Creating our own service panel. We use tools like Django, SQLAlchemy, Celery, etcetera, so we like to learn from you.
Wouter van Atteveldt: Large scale search and text analysis with Python, Elastic, Celery and a bit of R
I got my PhD in artificial intelligence. I am now a social scientist.
Why do you need text analysis? Example. In 2009 Israel and Palestina were fighting. How did the media report? I downloaded lots of articles and did text analysis:
- U.S. media: lots about Hamas and their rocket attacks that provoked Israel.
- Chinese media: more about the Gaza strip, actions on the ground by Israel.
So you can learn from text analysis. There is a flood of digital information, and you cannot read it all, so you use text analysis to get an overview. You need to go from text to structured data.
Facebook research. They changed the timelines of 600,000 people, some were shown more positive messages, some more negative. More positive messages resulted in less negative messages written by those people, but also less positive messages [if I saw it right]. Lots of things wrong with this research.
We can do much better, with Python. Python is platform independent, easy to learn. Half the data science community is in Python.
Or we can do stuff with the R language. It is aimed at statistics and data. There is convergence via numpy and pandas and R packages.
I helped create https://amcat.nl/ where you can search data, for example finding articles about Dutch politics in the Telegraaf. You can search there for 'excellentie', and see that this Dutch polite term for ministers was used until the sixties, and then resurfaced recently when a minister wanted to reintroduce this, and got satirical comments.
AmCat is a text search front end and an API, written in Python and R scripts and ElasticSearch. We upload texts to elastic on two separate nodes. You can never change an article: they are unique and hashed, so you get the same results when you search again next year.
For the Telegraaf paper, there is no data between 1995 and 1998. The previous years were digitised by a library, and the later years by Telegraaf themselves, but the current owner is not interested in filling the gap.
Our script interface is a Django form plus a 'run' method. This means our CLI, HTTP and API front end can use the same code.
For async jobs we use Celery. I know a professor who likes to write a query of several pages, which can take more than an hour to handle, so he would normally get a timeout. So we do that asynchronously.
We want to do NLP (Natural Language Processing). Or preprocessing. There are many good tools, like Stanford CoreNLP for English. We developed NLPipe, a simple NLP job manager. It caches results. Separated into server and workers. Workers can run on AWS via Docker, but since we have no money we are looking at Surf HPC but they don't support Docker, so we look at Singularity instead. Experts welcome.
Reinout van Rees: Fast querying and filtering with Django
[Same talk as on PyGrunn this year. I copied my summary and extended it.]
Goal: show what is possible. Everything is in the Django documentation. Just remember a few things you see here. If you know it is available, you can look it up.]
The example case I will use is a time registration system. Everyone seems to do this. Oh, less hands here than at PyGrunn. The tables we will use are person, group, project and booking. A Person belongs to a Group. A Booking belongs to a Project and a Person.
The Django ORM gives you a mapping between the database and Python. You should not write your own SQL: Django writes pretty well optimised SQL.
Show all objects:
from trs.models import Person, Project, Booking Person.objects.all()
Basic filtering:
Person.objects.filter(group=1)
specific name:
Person.objects.filter(group__name='Systemen')
case insensitive searching for part of a name:
Person.objects.filter(name__icontains='reinout')
or part of a group name:
Person.objects.filter(group__name__icontains='onderhoud')
name starting with:
Person.objects.filter(name__startswith='Reinout')
without group:
Person.objects.filter(group__isnull=True) Person.objects.exclude(group__isnull=False)
Filtering strategy:
- sometimes .exclude() is easier, the reverse of filter
- you can stack: .filter().filter().filter()
- query sets are lazy: only really executed at the moment you need it. You can use that for readability: just assign the query to a variable, to make complicated queries more understandable
- start with the model you want
Speed:
This will use one initial query and then one extra query for each person:
systems = Person.objects.filter(group__name='Systemen') for person in systems: print(person.name, person,group.name)
Not handy. Instead use the following.
select_related: does a big join in SQL so you get one set of results:
for person in systems.select_related('group'): print(person.name, person,group.name)
This does one query.
prefetch_related: does one query for one table, and then one query to get all related items:
for person in systems.prefetch_related('group'): print(person.name, person,group.name)
This does two queries. Both can be good, depending on the situation.
It is expensive to instantiate a model. If you need only one or two fields, Django can give you a plain dictionary or list instead.
Dictionary:
systems.values('name', 'group__name')
List of several fields in tuples:
systems.values_list('name', 'group__name')
Single list of values for a single field:
systems.values_list('group__name', flat=True)
Annotation and aggregation:
- annotate: sum, count, avg
- aggregation
- groupby via values (bit of a weird syntax)
Aggregation gives totals:
from django.db.models import Sum relevant_persons = Booking.objects.filter( booked_by__group__name='Systemen') relevant_persons.aggregate(Sum('hours'))
Annotation adds extra info to each result row:
relevant_persons.annotate( Sum('bookings__hours'))[10].bookings__hour__sum
Filter on bookings for maternity leave, group bookings by year, give sums:
Booking.objects.filter( booked_on__description__icontains='ouderschap' ).values('booked_by__name', 'year_week__year' ).annotate(Sum('hours'))
Note: I used the faker library to replace the names of actual coworkers in my database with random other names.
Practice this with your own code and data! You'll get the hang of it and get to know your data and it is fun.
What I hope you take away from here:
- Read the docs, you now have the overview.
- Make your queries readable.
- Practice, practice, practice.
[I will toss something extra in from PyGrunn, which was probably a question from the audience.]
If you need to do special queries, you can create a sub query yourself:
from django.db.models import Q query = Q(group__name='Systemen') Person.objects.filter(query)
You can write filters that way that are not in default Django.
Twitter: @reinoutvanrees
Wojtek Burakiewicz: Building robust command line tools with click and flask (and microservices)
I worked at Byte and am now working at Optiver, in the tooling team, using Python for data center management. I use Flask and Click, made by the same people. Our sd tool is internal, it is not on the Internet. It deploys Python applications to servers with virtualenv.
On the server side we us stash, artifactory, ansible+Jenkins, supervisord, JIRA. On the user side we have our sd tool to talk to the programs on the server.
You would normally need passwords on some servers, auth keys or ssh keys on others. Some ports open, a different one for each new app. Messy.
So sd talks to an api that sits in between. The api then handles the messy talking to the servers, and you store authentication and port numbers in a central config, instead of on the computer of each user. All users talk the the api, the api talks to the servers.
For the server side api we use flask and flask_restful. For the client side we use click.
When you install Django, you get a house. When you install Flask, you get a brick. Sometimes you want a house. Other times all you need is a brick. You can build something entirely different with a brick. It is easy.
Making an api is done easily with:
from flask_restful import Api, Resource
Then click for the command line tool. I always struggle with argparse, but I like working with click:
import click @click.command() @click.option('--count', default=1, help='...')
Click makes you structure your application in a nice way:
tool/ cli.py actions/ action1.py action2.py
We use a trick to force users to upgrade. With every request to the api we send the version of our cli. The api checks this against a minimum version and aborts if the version is too old.
Documentation:
- Flask: http://flask.pocoo.org
- Click: http://click.pocoo.org
Sprint wrap-up Sunday
Wrap-up from Sunday of the sprints at the Plone Conference 2017 in Barcelona.
Sprint document is on Google Docs.
- Pyramid: a few more documentation updates.
- Plone and Zope 4. Down to seven failing tests, very good. Everything is merged, the master branch of CMFPlone is using Zope4, the PLIP job is gone.
- Plone to Python 3. We decides to use six, which is a dependency of Zope anyway. Lots of PRs. Experimenting with sixer, which 'sixifies' the code automatically. GenericSetup: slowly working through incompatibilities.
- Plone rest api. Some issues solved. plone.app.event stores start and end date timezone aware, and the rest of the dates are timezone naive, and there is no hint in the schema on what is naive or not, so that gives us problems, evaluating how to fix it.
- VueJS SDK. Implementing traversal. Creating edit forms out of schema. You can add views with a plugin. Automatic testing with Travis is setup. Next: component. Editor.
- Pastanaga Angular. plone/pastanaga-angular. Demo time! mr.developer work done.
- Pastanaga.io, creating mocks.
- Guillotina, made pastanaga-angular work with guillotina, you can login, browse content, navigation. guillotina_cms layer. Robot framework tests, with robotframework.guillotina for test setup.
- Plone CLI. I can show you. Main setup is in place. plonecli create addon collective.todo; plonecli build; plonecli serve. Or in one command: plonecli create addon collective.todo build serve.
- WSGI in plone.recipe.zope2instance. All merged. Python 3 compatible.
- websauna. Pyramid 1.9 support is 100% done. In another week we can release a new version.
- pas.plugins.ldap. Problem that tests are not running on Travis. We now know what is happen, but not yet why, when half a year ago it worked. We got LDAP running locally on Mac, so it becomes easier to test and fix.
- docs.plone.org upgrade guide, just came in, documented one PLIP.
- JSON Schema Builder with JavaScript. Demo time! You can click a form together, save it as json, and view it with Angular. From there you could save or mail the filled in data. You can do validation. We have collective.easyform which is Plone only, but this is more general: it's just json on the back end and front end. [Very impressive!]
- Update XML-RPC to support dexterity. First pull request done.
- Mixed bag. Removed all robot screen shots from documentation, they live under CMFPlone now, making it easier for others to write and test. Mixed results from Chrome and PhantomJS, also changing from version to version. With that, for papyrus, our documentation build system, we no longer need to build Plone.
Sprint wrap-up Saturday
Wrap-up from Saturday of the sprints at the Plone Conference 2017 in Barcelona.
Sprint document is on Google Docs.
- Working on moving Pylons to the Plone Foundation. Tedious, painstaking work. PRs for documentation and some bugs.
- Eric made coredev branch 5.2. Merged Zope 4 PLIP changes into that. Same amount of failures as yesterday, working on getting the build green. Work on porting databases, some mosaic problems are being fixed, most add-ons are okay. Wrote documentation for some code changes you have to do.
- Plone to Python 3. We tried to fix all the imports in all the Plone packages that break on Python 3. Long list of PRs in the Google Doc. GenericSetup Python 3 branch that we first got to work on Python 2 again. Working through the usual string issues. Some semantic issues for PropertyManagers that we need to fix in Zope first. Gil made a list of which packages are not Python 3 yet, already in June, we ask him to update it.
- Plone rest api. Problem with root users. There is a PR which disables that, but I have a workaround ready now.
- VueJS SDK. plone.vuejs package, but may be renamed. Just basic stuff. Test setup. Started on some features, like traversal.
- Pastanaga Angular. Travis setup. Universal. A mr.developer for Angular. Login form is done. Work on API and SDK.
- Pastanaga React. Struggling with several issues.
- Pastanaga.io, talking about license, fund raising.
- Guillotina some work done, PR.
- Plone CLI. Front end working. Fixing stuff in bobtemplates.
- WSGI in plone.recipe.zope2instance. PR merged into master. Should be there in Plone 5.2. Support in the core buildout for the WSGI parts: wsgi.cfg config file. Basically done.
- websauna. Pyramid 1.9 support is 80% done. Work on cookie cutter template to support Docker images. Will become easier to startup.
- plone.org improvements, made mockups to make packages more visible. Set of icons will be reviewed. Should be discussed with website team. Make the listing more emotional.
- pas.plugins.ldap. Fred chatted with Jens how we can merge back improvements from Asko and Zest. Documentation, that might be later merged to docs.plone.org. Also some collective.recipe.solr work.
- docs.plone.org upgrade guide, worked on documenting the PLIPs, restructuring a bit
- JSON Schema Builder with JavaScript. Browser view with drag and drop, save in dexterity object. Angular app that traverses to the end point of the schema. Missing is the order of the fields which is not correct, and actions.
- Mixed bag. Fixes for docs.plone.org, new theme release with better version dropdown. Meeting with Manabu to talk about Tokyo. Server consolidation planning. Contributor agreements signed, 2.5 of them.
Lightning talks Friday
Lightning talks on Friday at the Plone Conference 2017 in Barcelona.
Andreas Jung: Collaborative content creation with smashdocs
Web based collaborative editor. Better than Google docs: it can be hosted by yourself. Intelligent documents. HTML and XML export. Tracking of changes. Chat and discussion. Docx import and export Integrates with the Plone sharing tab. Content life cycle indicator.
Naoki Nakanishi: Microcontrollers and Plone
I work at CMScom and I like IoT (Internet of Things). Microcontrollers can connect to Plone easily. This is because Plone has RESTful API products. We program the microcontrollers with the MicroPython language. This has the useful urequest and ujson modules. It supports many microcontrollers. I have a rough concept, but I will start to develop this from tomorrow.
Maik Derstappen: bobtemplates.plone
I have been working on bobtemplates.plone:
mrbob bobtemplates.plone:addon -O collective.todo
You can now actually add a content type in an existing package, using a sub template. It will currently overwrite code, so you want to start with a clean git checkout.
See my talk this afternoon.
Unrelated: Plone Tagung 2018 is planned on 20 March in Berlin. Main topics of this conference will be in German, but if others want to join in English, you are welcome.
Érico Andrei: several packages
- contentrules.slack: post to a slack channel when something happens in your Plone Site.
- collective.selectivelogin: restrict login
Alexander and Sally: Plone 5 add-ons
We had nominations and votes for Plone 5 add-ons. We had problems with losing the papers where you could vote, so this is with a grain of salt. The top results:
- plone.restapi
- eea.facetednavigation
- plone.app.mosaic
- collective.easyform
On plone.org we have a list of add-ons which are managed by hand. There is a list of Plone releases, where the versions are not sorted right (alphabetically, so 1, 10, 11, 2, 3, etc). So this needs to be improved. During Google Summer of Code work was done here, getting information from PyPI. It still needs work, especially design work can help a lot, to present is nicer.
Nathan and Ramon: Docker, guillotina
Docker Compose is the new buildout? This might be a pattern that works for you.
We have a CMS on top of guillotina: https://github.com/guillotinaweb/guillotina_cms
Lots of other packages: https://github.com/guillotinaweb
Charles Beebe: Inclusion > Diversity
Inclusion is more than diversity.
Thank you all, this is my first Plone conference and I felt welcome. I never thought I would feel comfortable to do a presentation the first time I came to a conference.
Have you ever felt uncomfortable during a conference?
You may 'cover' yourself, hiding something of you. That does not help. Even 45 percent of white males in America do this. Do you make people feel at home? It does not have to be complicated. I got a cake from my colleagues when I got engaged.
Philip: Plone 2020
Plone 5.1 master branch with small changes works on Zope 4.
In Brasil Paul Everitt said: "You are dragging the dead body of Zope with you." In 2020 Python 2 is no longer supported.
We investigated and found out that Zope is actually not dead. Plone 5.2 will use Zope 4, discussed yesterday.
Plone minus Archetypes minus ZServer plus Python 3 will be some Plone version.
Some sprint will focus on this area:
- Alpine City Sprint Innsbruck in January 2018
- Amsterdam Spring 2018
Where we are now, felt impossible in Brasil 2013.
Roel Bruggink: demo of Plone
Plone demo, logging in, view documents, view history, view changes, edit, site setup, display menus.
What you see here, is bits of Pastanaga and bits of React front end.
Oshane: Plone theme editor
I worked on the theme editor during the GSOC (Google Summer of Code). I will give a demo. Contextual menu for renaming or moving files. Find a file by its name, or find text within files and go to that exact line. Drag and drop files. Import rapido apps.
Mikko Hursti: list customisation
I worked on improving the list customisation using mosaic during the GSOC.
See my final report.
Manabu Terada: Plone conference 2018 Tokyo
The Plone conference 2018 is going to be in Tokyo, Japan. Tokyo does not start with a B, but it has a Bay area, so is it okay?
Two years ago, we had the Plone Symposium Tokyo. PyconJP 2017 in September had lots of visitors.
FAQ:
- English OK? Yes
- Expensive? No, food and hotel not. Taxi, sushi, beer: a bit.
- Safe? Yes. In 2020 we have Olympic Games.
See you next year in Tokyo, 5 to 11 November.
Ramon and Victor: Goodbye
Thank you for coming, good party, good to see new faces from other communities. I hope we keep following this path of opening up to other communities. Glad that it was safe, with all that is going on in Catalunya. We are very happy about organising this.
Thank you Agata, my beautiful wife. Thank you Timo for allowing me to spend an insane amount of time on the conference. Thank you Albert Casado for the beautiful design. Thank you Kim for all your work. Thank you to sponsors, people filling the bags, Sally, Eric, volunteers, time keepers, thanks all for joining us. It was a once in a life time experience. Hope to see you soon in the Plone world.
Éric Bréhault: Building a Cathedral Over Decades
Talk by Éric Bréhault at the Plone Conference 2017 in Barcelona.
When you build a CMS, you might start small, but you end up with a very large stack. For Plone, some of this stack is more than fifteen years old.
What do we want to work on for the future? Zope 4! Guillotina! Headless CMS! Everything! So many challenges and huge projects! In a business situation you would probably say this is bad. So why is Plone still alive? Emotions and culture.
Emotion
A software developer feels like a parent to his code. An open source community is like a shared parent group. Why does this work? Love.
Open source is not business. I can prove that. Business means you are busy. Busy means you are not free. Not free means you are not open. Clear.
The business world talks about disruption. It is violent. Okay for the business world.
Business values a 10x developer. Open source knows: the only way to be a 10x developer, is to have ten developers be twice as good.
Nine couples cannot make one baby in one month. One couple makes a baby in nine months, and it takes a village to raise the baby. Open source community.
Results versus process. Process provides emotions. Results provide money.
Developing with each other is sharing emotion. The Plone community is not just sharing code, it is sharing emotions. It feels good to share.
Empathy: feel what someone else is feeling. It is not something that you decide to do. Empathy makes it possible to share emotions. Empathy is the first open source process.
We are emotion addicts. This is true for Plone developers just as much as for Justin Bieber fans.
I think people are altruists by nature, not egoists. We want to do something for another. Our need for emotion is bigger than our need for money.
Emotion is why Plone is still alive.
Culture
Culture is how Plone is still alive.
Our everyday miracle is: pluggability. This comes at a price. Would we release a module without tests, or with a funky css selector? No. People who build Plone add-ons are following the rules, so it is safe to install.
Old Greeks had the word 'Pharmaka' for something that heals, but can also be dangerous. 'Per aspera ad astra': through difficulties to the stars. We give core commit rights to anyone who wants to join us.
The Plone community as a whole has knowledge, a diamond mine.
Building a cathedral
Plone is like the Sagrada Familia. It was created by someone who has left, and it is still being built.