Pieter Hintjens: ZeroMQ

published May 22, 2015

Pieter Hintjens gives the keynote talk about ZeroMQ, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

Distributed systems are about making software more like the real world. Rule one of building software: bring the community in it.

Underlying trend: the cost of scale. How do you upgrade a billion computers? You can't. The amount of computers in your pocket keeps doubling every few years.

Building culture. Building knowledge.

ZeroMQ is one of the first successful communities that is built around distributed systems. Hundreds of language bindings.

Conway's law: an organisation builds software that looks like the organisation. To build distributed systems, you need to be a distributed organisation. Meetings are barriers for organisation. Including daily stand-ups. You block everyone's time until you have reached a decision. Meeting people, exchanging ideas, is fantastic.

ZeroMQ 2, 3, and 4, from a long time ago, were incompatible. When you break contracts, you are an asshole.

I don't know where my community comes from. I don't have to trust them, don't have to pay them.

You can do anarchism. It does not work. One devious character and the system falls apart. And those people are there. You need to have contracts, enforcement. Ability to participate and to reject participants.

Why has no one in this room created a pull request for ZeroMQ?

What you want, is participation. What kills a system, is that no one cares.

Open source is science. Closed source is magic.

Pizza places have gone bankrupt because they put ansjovis on the wrong pizza. They broke the pizza contract. But I don't care how clean their kitchen is, as long as we don't get sick. The pizza should be a certain size, temperature, price, and then it is fine.

When someone submits a bad pull request, I merge it. I don't care. What happens if someone keeps submitting bad pull request? If it is easy to make a mistake it is also easy to revert it or fix it. Don't release from master, because it may be broken. But push to it all the time, otherwise you break the flow. Community is more important than the code. You always merge someone else's pull request. My daughter is maintainer on some of my projects, see merges my pull requests and I go like 'yeee!' That is important.

You can make money from good open source. Leverage, re-use all the knowledge from ZeroMQ. People are making money from this every single day. Making money is still difficult. Open source opens doors, also for single people with large organisations as clients.

Once you start thinking about money, your brain gets wired for cheating, being dishonest.

Anyone can solve problems, but they are often the wrong problems. Closed source is like that. Paying people for writing software is like that.

Most open source still fails, it is not a magical formula. But the best will emerge.

Oscar Vilaplana: Orchestrating Python projects using CoreOS

published May 22, 2015

Oscar Vilaplana talks about Orchestrating Python projects using CoreOS, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

This is about orchestrating stuff. Not specifically Python, but the examples will be with Python.

Reliability. Your app does not care where it is running. Runs locally, runs on the cloud, same thing. Portability, repeatable. Loose coupling, compose micro services, makes it easier to scale them, mix and extend.

Cluster-first mentality. Even development machines can run in clusters. Lot of different containers, servers, ports, all connected, how do you manage this. Do you need to be developer and operations officers in one? Let the system figure it out. Let other smart people figure out and determine where the containers are.

One deployment tool for any service, makes them better and better, new service is then not much extra work.

Demo: deploy db, flask app, scale them. RethinkDB, 3 replicas. Give a name to the service, connect to it from the app by name. With kubectl start the service with the wanted amount of replicas.

CoreOS: Kernel + Docker + etcd. Read-only root, no package manager, self auto-updating, systemd for setting limits to a service and making sure it starts and restarts when it is stopped. Socket activation, starting a service only when someone starts using it.

etcd is a distributed configuration store. Atomic compare and swap, change this setting from 1 to 2, otherwise fail. HTTP API. Configurable at runtime. etcd set /pygrunn/current_year 2015

fleet is a distributed systemd. Start a service somewhere in a cluster. Coordination across the cluster. Rules like do not start A and B on same server because they are cpu hungry.

Service discovery: ambassador pattern, talk to someone who knows where the services are.

flannel does something like that. Per-cluster and per-machine subnet. Every container has an ip on a subnet.

Kubernetes: process manager. A pod is a unit of scheduling, name, container image, resources. Labels for finding things.

Replication controllers: create a pod from a template. Ensure that the exact number of pods are running. Upgrades.

Service discovery for pods. An ip per service.

Demo.

It took me a while to wrap my head around it. Look at the slides in the calm of your home.

Me on Twitter: https://twitter.com/grimborg

And see http://oscarvilaplana.cat

Lars de Ridder: Advanced REST API's

published May 22, 2015

Lars de Ridder talks about Advanced REST API's, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

See slides at http://todayispotato.github.io/rest-design-talk

I am head of tech a Paylogic.

Goals of REST: loose coupling between web client and server. Use existing web infrastructure.

Step 1, 2, 3: design your process.

Example with data as basis for the modeling. Simplistic database tables: CoffeeType, CupOfCoffee, Order, Barista. So you GET or POST to /coffeetype, /cupofcoffee, /order, /barista. POST will have keyword arguments, like number of cups. What is missing? You want to get the price, without ordering first.

So far, this is easy to design and build the server. But you get logic in the clients, like for the price. Tight coupling between api design and database design. Your table names should not determine your api names.

Model your process as seen from the end-user. For every step 'invent' a resource. In our case they might be /coffeetypes, /quote, /orders (or /payments), maybe /baristas. For every resource determine the relations. This is the most important parts. Never rely on urls, but use link relations. IANA.org has defined standards for this. Consider which data is involved for every resource.

Media types for APIs. Standard on how to format your (JSON) response. There are standards for this, so do not reinvent the wheel. We use HAL. It is minimalistic, basically only describing how to embed information. By visiting a url you can discover other urls that you can call. Others: http://jsonapi.org, Mason, http://jsonpatch.com, http://json-schema.org.

"I don't need to write documentation, my API is discoverable." This is of course not true. Discoverable APIs help when you are developing an application that uses the API. But do document the process that apps should use.

You should learn HTTP, learn what the verbs really mean. REST really is HTTP.

We chose to evolve our API instead of versioning it, using deprecation links.

Erik Groeneveld: Generators

published May 22, 2015

Erik Groeneveld talks about generators, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I started my own company, Seecr. This talk is about PEP 342 (Coroutines via Enhanced Generators) and PEP 380 (Syntax for Delegating to a Subgenerator) and beyond. Most of this session is detailed on http://weightless.io/compose

I will talk about five simple things. Once you start working with it, it can get quite complicated, but it is really great.

Basis

Basis: list comprehensions with round instead of square brackets. It gives you a lazy thing instead of a completely populated list. You call .next() on the result to get the next item. Very useful for creating lazy and efficient programs.

Generator functions: using yield instead of return will turn a function automatically into a generator.

Generalized generators, PEP 342

Accept input for a generator: use .send(value) to send a new value to a generator. The generator then goes both ways, and then you basically have a coroutine.

Decomposition, PEP 380

Decomposing a function into two functions is easy. With generators it was not possible. Now it is. Add the @compose decorator to the main generator and call yield sub() on the sub generator. On Python 3.3 and higher, it is nicer: yield from sub(). But you need two new concepts to write real programs with it.

Push back, beyond

Implicitly at the end of a normal function, there is return None if there is no explicit return. For generator functions, you implicitly get raise StopIteration. With Python 3.3 and higher you can explicitly return 42 which translates into raise StopIteration(42) to return a value.

But you can also then do raise StopIteration(42, 56, 18). We will use this to push data back into the stream. The main generator will get 42, and then the next main yield will give you 56, and then 18.

None protocol, beyond

If you yield nothing, you return None. So yield None is the same. With yield None you get the generator ready to receive data, putting it at the first yield statement, so you do not need to first call .next() on it. You alternate the reading and writing stage. You send a few values, and then you tell the generator that you are ready to receive data again by yielding None to it.

Applications

Weightless is an I/O framework. It ties generators to sockets. See http://weightless.io/weightless and http://weightless.io/compose and source code at https://github.com/seecr/weightless-core

Bob Voorneveld: Implement Gmail api in our CRM system

published May 22, 2015

Bob Voorneveld talks about implementing Gmail api in a CRM system, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I am working for Spindle. We are primarily working on a CRM system. Most Spindle programmers used to work at Voys. We are growing bigger and are still hiring, so come and join us.

Why we started building yet another CRM: communication should be the central point. We are building 'HelloLily'. Lily should be funny, nerdy, smart. HelloLily focuses on accounts, contacts, cases, deals, email, phone calls (future).

Focusing on the email part in this talk. Last year I said: instead of the really old IMAP, let's switch to the gmail api.

Implementation currently: Heroku, Python 2.7, Django 1.7, Django Rest, Django Pipeline, Celery, IronMQ, PostgreSQL, ElasticSearch, Angular 1.3, Logentries.

Celery: 1 scheduler, every five minutes sync every account. Two functions to sync email: first sync and incremental sync. Many functions for sending, moving, deleting, drafting email. This is asynchronous to keep quick responses.

With IMAP we previously had email sync problems: authentication (we should not store your password), not easy to keep track of changes of what happened to a mailbox, no partial download of only one attachment, IMAP implementation differs. Also, searching in the database was not very efficient, even with indexes. We had PostgreSQL problems, models spread over many tables, searching was slow because of that, with every email the search time increased, partial matching was difficult.

So we wanted a search index and use gmail api.

Gmail api: easy api, installable with pip, keeps track of messages (like: since last sync we have 5 new mails and these three have been deleted and one edited), partial download. Since February there is tracking of the type of change. Downside: it is limited to Gmail / Google Apps for business.

Using PostgreSQL with ElasticSearch. Emails are mapped in documents in ES, models pushed to ES with a post_save_signal in Django. Fast response time, averaging 50 milliseconds.

Problems that we encountered and fixes. Encoding and decoding messages, do not trust that the claimed encoding is correct. Sending, forwarding email, with attachments. Losing labels entirely instead of just for one message due to a coding error. High memory usage, scaled up for now. Sending big messages, needed to send in chunks.

Some colleagues are now saying: it does not work properly, let's switch back to IMAP. But we are getting there.

Still beta, testing it out ourselves. See source code on https://github.com/HelloLily

Me on Twitter: ijspaleisje