Steven Pemberton: The future of programming

published May 13, 2016 , last modified May 17, 2016

Steven Pemberton gives the second keynote at PyGrunn, about the future of programming.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I am from the CWI, Center for Mathematics and Informatics, at Amsterdam, where Python was born, and where I worked on ABC, the basis for Python. I wrote parts of gcc. I ended up chairing the w3c html working group.

I will talk about Moore's switch and the future of programming.

We were developing ABC in the beginning of the eighties, when computers were really slow. However, we knew about Moore's law, that computers would become faster.

In the fifties, computers were really expensive. You could hire an hour of computer time for the amount of money you pay a developer in a year. What I call Moore's switch: this has gone the other way around. Earlier programming languages were geared towards making it the computer easier, not the programmer.

Moore's law: computing power doubles every 18 months. In 1977 was the first time I heard say that Moore's law was soon over. In 1988 my laptop had a power of 500, now it has doubled fifteen times.

By the 1970's, computers had become cheaper, but programmers not: software crisis. Ninety percent of the cost was in debugging. Fred Brookes wrote about this in The Mythical Man Month. The larger the program, the more expensive it becomes.

An order of magnitude improvement would help a lot. What takes one week, would take a morning instead.

A declarative approach is much shorter, and therefor faster to write. Can this help? What does declarative programming mean? I wrote a declarative clock program in the beginning of the 1990s of twelve lines, instead of 1000 for a procedural clock program.

Declarative: you specify what needs to be and remain true. This also means there can be no while loops.

Look at XForms for a declarative language. A certain company went from five years and thirty people to finish a project to one year and ten people, by using XForms. It shows that declarative programming is feasible, usable, for real world projects.

I believe that eventually everyone will switch to declarative programming.

Questions.

"Some programmers move from Python to lower level languages to get more performance out of computers." They may not have done the numbers. Programmers need to learn a new technique, which may make them not want to do this. Countries have held back the use of the Arab numerals that we now all use.

In ABC we saw that people were mostly busy with sorting and searching. So we made this very fast.

"Do big companies use this?" Yes, various, like Yahoo, IBM. Usually in small groups, not company wide.

"What will then happen with Python?" What happened to Pascal?

"Where should I start?" Look at Xforms. That is the only standardised version that I know of that does this stuff.

Strictly speaking, spreadsheets are declarative.

"What books do you recommend?" There are no books on XForms yet.

"Do you consider Prolog a declarative language?" Not really, though I see what you mean.

[For more information, see the XForms article on wikipedia, Maurits.]

Twitter: @stevenpemberton

Martijn Faassen: Morepath under the hood

published May 13, 2016

Martijn Faassen gives the first keynote at Pygrunn, about Morepath under the hood

Python and web developer since 1998. I did Zope, and for a little while it was as popular as Python itself.

What is this about? Implementation details, concepts, creativity, software development.

Morepath is a web microframework. The planet Zope exploded and Morepath came out. It has a unique approach to routing and link generation with Traject. Easy and powerful generic code with Reg. Extensible and overridable with Dectate.

In the early nineties you had simple file system traversal to publish a file on the web. Zope 2, in 1998, had traversal through an object tree, conceptually similar to filesystem traversal. Drawback: all objects need to have code to support web-stuff. Creativity: filesystem traversal is translated to an object tree. Similar: javascript client frameworks that mimick what used to be done on the server.

Zope 3 got traversal with components: adapt an object to an interface that knows how to publish to html, or to json. So the base object can be web agnostic again.

Pyramid simplified traversal, with __getitem__. So the object needs to be web aware again. Might not be an issue.

Routing: map a route to a view function. As developer you need to handle a 404 yourself, instead of letting the framework do this.

You can fight about this as frameworks. But morepath has it all. It is a synthesis.

I experimented with a nicer developer API than Zope was offering to get a view for traversal. So I created experimental packages like iface and crom. I threw them together in Reg. It was just a rewrite of the Zope Component Architecture with a simpler API.

Class dispatch: foo.bar() has self as first argument. Reg uses functools.singledispatch and builds multiple dispatch. But then I generalised it even more to predicate dispatch, as Pyramid had.

Don't be afraid to break stuff when you refactor things.

Dectate is a meta framework for code configuration. Old history involving Zope, Grok, martian, venusian, but now Dectate. With this you can extend or override configuration in your app, for example when you need to change something for one website.

Detours are good for learning.

Splitting things off into a library helps for focus, testing, documentation.

Morepath uses all these superpowers to form a micro framework.

Twitter: @faassen

Bart Wesselink - Processing large quantities of online payments

published May 13, 2016 , last modified May 17, 2016

Bart Wesselink talks about processing large quantities of online payments, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I am here on a personal basis, not on behalf of a company, although I am a finance guy at SFX Entertainment, and formerly at PayLogic.

Online payments: digital certainty of cash over cash. It is real-time confirmation that you will get your money and can deliver a product.
Large quantities: high volume in a short span of time.
Unit of measurement is transactions per seconds.
Every second of downtime is revenue lost. This can be as high as 3000 euro per second for some companies.

Standard online payment environment has four corners:

consumer
merchant
issuer/consumer bank
acquiring/merchant bank

The scheme or payment method can be VISA, which is intermediary between the two banks.

This model has several 'single points of failure'. What we can make redundant, is the PSP/Gateway between the merchant and his bank. Service Level Agreements are useless here: you will never get back anything close to the money you lose. So redundancy is key.

Partially: offer the consumer different ways to pay: credit card, ideal, etcetera. And, when the consumer decides to pay via VISA, you want to have a few options: if one payment provider has problems, then another can fill the gap.

Monitor how well each route is performing. On iDeal (a payment system in the Netherlands) you can now get a message when there is a known problem with a bank.

Credit card plus expiry date is enough for most credit card payments. But there is 3DSecure: extra security for VISA. But: there is more that you need to monitor.

From the first six credit card digits you can learn a lot. There are databases for this, showing card brand, country, card level, etcetera.

Remember: stay compliant to https://www.pcisecuritystandards.org

There are horror stories, like a local payment method that could only handle 1.5 transactions per second.

Lessons learned. Big names does not mean big performance, even for banks. You see sloppy implementations. Do logging and monitoring, lots of them.

Adam Powell and Denis Dallinga - Recommendation systems @ Catawiki

published May 13, 2016 , last modified May 17, 2016

Adam Powell and Denis Dallinga talk about recommendation systems at Catawiki, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

Catawiki is an online auction platform, for all kinds of things, including Napoleon's hair. Some projects ar interesting programmatically.

Auction listing page optimisations. Which options do we recommend to users? We base this on similar users. We can use a Jaccard normalised model. The Co-Occurrence model gives different recommendations.

Bids go into the data warehouse, to Spark, to the ruby 'grape' framework which is a personalisation service. We A/B test new ways of doing recommendations. We need to balance popularity and novelty, freshness (don't keep showing the same ones), diversity.

Problem: new users of which we don't yet know much. We use the Python library theano [see the machine learning talk, Maurits].

We have 35 thousand auction lots per week, and 350 thousand bidders, which leads to a lot of data. Recommendations for all users can be recalculated every five minutes. Using Snowplow for recommendations to all users.

Recommend a category for a new lot that someone enters. gensim Python module for natural language processing, run this over all categories, put this in elastic search.

See http://www.catawiki.com

Peter Odding and Bart Kroon - Understanding PyPy and using it in production

published May 13, 2016 , last modified May 17, 2016

Peter Odding and Bart Kroon talk about understanding PyPy and using it in production, at PyGrunn.

PyPy is the JIT (Just In Time) compiler for Python. Also known as: Python in Python. The standard Python interpreter is CPython, so Python written in C. There are other options, of which PyPy is the most mature.

PyPy is a Python implementation. Compliant with CPython 2.7.10 and 3.2.5 at the moment. It is fast. This was not the case earlier. Speed is better. Contrary to popular belief, PyPy can actually reduce memory usage. Multicore programming, stackless feature for massively concurrent programming, so microthreads, greenlets.

PyPy is written in RPython. RPython is a strict subset of Python, statically typed, translated to C and compiled to produce an interpreter. It provides a framework for PyPy and others.

Run it: pypy your_python_file.py.

But when you use C extensions, it is not so easy. Some may work. What then? The PyPy folks would have you use cffi: C Foreign Function Interface, if your module needs C code.

Software Transactional Memory: Python without the Global Interpreter Lock. Actually slower on a single thread, but with two threads you already have performance increase. Side effects make transactions inevitable, so watch out with concurrent logging and file I/O in general: side effects will result in the other threads being rolled back to try again. Interesting to follow, also if you are using other languages.

How PayLogic came to use PyPy. We sell tickets, which can lead to a lot of visitors in very short time, so you get a DDOS from your customers. So we started using a CDN for the html, and only small json requests to servers. For the json we still needed lots of servers, and state synchronisation was still a problem. We did not use any C modules, the biggest part was Tornado. So we just changed to PyPy.

It almost worked. What went wrong?

Garbage collection works quite differently in PyPy. PyPy periodically stops execution to mark reachable objects. And objects could be alive after they were no longer used, and we ran out of file descriptors in seconds. A cache solved this.
UUID4 implementation in PyPy was wrong, resulting in far from unique non-random ids.

Our results:

Quadrupled performance, in 2013 already. Now around eight times, with every upgrade there is a bit more performance improvement.
Real saving on hosting costs, less servers needed.
Our queue works for at least two million visitors now.

Other things you can do: run javascript, lisp.

Guido van Rossum said: "If you want your code to run faster, you should probably just use PyPy."

Slides: http://peterodding.com/presentations/2016/pypy