Weblog
Rix Groeneboom (Parasoft): Mijn Overheid: Performance testing in practice
Summary of talk at the PyGrunn conference.
Rix Groeneboom (Parasoft) talks about Mijn Overheid: Performance testing in practice, at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
('Mijn Overheid' is Dutch for 'My Government'.) I have a new hobby: collecting screen shots. One is a screen shot of 23 March 2010: disruption at DigiD, central login application of the government, for example for submitting your tax form. Another one: Servers of Cito offline during exam. Yet another one: at the DUO website (government agency that gives loans to students) you could see data from other students. And: changes in a test environment were available on other production systems, so people suddenly discovered they were married or had children.
At the Mijn Overheid website you can login with your DigiD and see all kinds of info about yourself, like home address, home ownership, license plates, speeding tickets. If you agree, the government can send you messages in the system instead of via paper mail. The government wanted a way on this website to know for sure that you have read it. The system is called the 'berichtenbox', message box.
We wanted to be able to simulate lots of civilians ('burgers') to login and use the web site and see what that does with the load on the servers; what is the 'temperature' of the system? Systems in isolation were tested, but there were also dependencies and those make it trickier.
With the government we specified performance requirements. We profiled the server implementation (a LAMP setup). Some search queries were very slow as they needed info from several servers; searching for 'gemeente' (community, city) would give almost every item back as search result. We profiled the combination of the MO (Mijn Overheid) website with the GEB (database with info on civilians).
We had only 2 DigiD accounts available to test the load on the system. That is not quite enough. So we faked/stubbed the DigiD server. This way we could let the test MO website talk with our fake DigiD server with lots more load, making the testing also simpler.
We created a platform for SOA and chain testing. Written in Java, Jython and Eclipse. Agile and continuous integration with a command line and web interface. It could run on Windows, Linux, Solaris, with a Linux VMWare image available. External integration with version control, defect tracking and testdata generation.
For the load testing we used Python and MySQL. We implemented intelligence, like simple 'wait' steps, generating valid BSN (social security) numbers. Every request was logged in MySQL; this database also had a shadow administration of user accounts. Extra checks to see if someone had been able to see information for a different account.
The system was very fast on low load. But at 1000 users at the same time, requests could take 20 seconds. That would be quite an extreme load, but good for testing. We found that the database query took 90 percent of the time. Also loading the PHP modules at some point took 8 seconds. A little smarter programming fixed that (being smarter so you don't always need require_once in the interpreter). Some optimizations in NFS, Apache and the OS.
We talked about process improvements. The load testing sometimes could not be done on the acceptance server because that was being used to test functionality. We figured out where the actual bottle necks were, which could be on a different system: it's nice if the load on our server is low, but if the user is still waiting for reaction from a different system it does not buy him much.
Summary: We set a performance norm. We found out the weakest link in the various system. Result was a large number of improvements.
Pygrunn
Links to all my summaries of talks at the PyGrunn conference.
On 20 May 2011 Paylogic and Goldmund Wyldebeast & Wunderliebe hosted the second PyGrunn conference in Groningen, The Netherlands. The conference was about the Python programming language and related technologies. There were two and sometimes three tracks of talks. Here are links to the summaries I made. See my brother's weblog for his (partially overlapping) summaries.
- Mijn Overheid: Performance testing in practice, by Rix Groeneboom
- Lightweight Python deployment servers, by Luit van Drongelen
- Mobile Architectures, by Gideon de Kok
- Making large, untested code bases testable, by Henk Doornbos
- Redis in Practice, by Pieter Noordhuis
- ØMQ (Zero MQ), by Òscar Vilaplana
- The ten commandments of security, by Jobert Abma
- State of webdev in Python, keynote by Armin Ronacher
PUN meeting 19 May 2011
Meeting of the Python Users group Netherlands. Hosted by Lunatech in Rotterdam.
Peter Hilton: Welcome to Lunatech
Sorry, I have not written any python myself. ;-) Technically focused company. We do almost everything in Java. We use an open source stack. So not very different in mindset, we just don't do a lot with python. Focused on back end stuff. Commercial IT services. Ten years ago we used Perl. I don't know why we switched to Java as I was not there at the time. I also don't know why we stay with Java. Actually, Scala is a more likely successor for us than python. We make money, which allows us to put this beer on the table. We hire hard-code geek programmers, creative rocket scientists, people who get things done, tech wizards with ambition, obsessive-compulsive types, managers in suits (not!!!). Python paradox: potentially you have more good candidates than in Java as there are so many bad Java programmers. We have a coding question for job interviews. We had to make it simpler or too many people would fail. It was not a question of: pass this test and you are hired, but pass this test and we can continue with the job interview.
Kit Blake (Infrae): VisitorVu: Real-Time Visitor Tracking for Your Website
http://brusselnieuws.be. Three different news sources coming in. They wanted to know real-time what was going on on the website. Direct hits on the web site, searches, external links, internal links. Popularity. Tag cloud. We are considering offering this as SaaS; we are probing if there is interest in the market. So maybe you or your clients? See http://visitor.vu/ and the code.
Steve Alexander: Interactive code previews
I ran a large software team at Canonical. Mostly I was hiring Python programmers. I was always looking for existing code that interviewees had already written and that was out there being used. Then I quit my job and took acting classes, did coaching, etcetera. How do people communicate, that's what I am busy with.
Seizing the task. Fix a bug, optimize, write a blog post, all are tasks. Rubber duck programming: you buy a rubber duck and explain him the process of what you are going to do. By talking and hearing yourself back (or maybe writing and reading it) you focus and it becomes easier to see if it still makes sense. Geir Baekholt at Jarn: "When doing a task, we talk it through with another person." Are you a good listener. Presume a lot: they have expertise (else why are they working here), they can find things out for themselves, are creative. Destroy ambivalence: not maybe/possibly/whatever, but yes or no. If an issue is in a bug tracker for too long it sucks the life blood out of a company, product, community. Motivational interviewing: don't try to make someone enthousiastic or offer 'helpful' solutions that the idiot other has probably not thought about, but ask him about his own motivation and how he could make himself more motivated. I started as a very technical manager, which made people look up to me too much as I could maybe do their job better. If you hire someone, believe in his skills and trust him; try to find smarter people than you.
You see a lot of small software companies in The Netherlands; that is cool. Every company seems to have its own culture. It is important to attract good people: I want to work with people I want to work with.
Steve Alexander presenting at PUN [Photo by Jasper Spaans]
Matthijs Kadijk: Some nice things you can do with google data APIs
I am an independent software developer. Just moved to the Creative Factory in the Maassilo in Rotterdam. Most of the Google services have AP Is that you can talk to within your program. RESTFul web services, an xml version based on ATOM feeds, json available too using alt=json parameter, OAuth authentication. Some use cases: schedule an SMS (text) message, add an appointment to a calendar, import data from Office docs without having Office, perform OCR, create a PDF invoice. Google code has a python client lib gdata, available via PyPI: http://pypi.python.org/pypi/gdata The python implementation does lag behind the other implementations a bit, so maybe some of us can help? The services can respond slowly, so watch your timeouts. Read the source code of the library to find non-documented features. Let the user do a manual login first, otherwise you can get a captcha. Use stored access tokens to prevent blocking after too many login requests. And you may want to consult an expert when you run into problems.
Nicolas (Lunatech): Play! framework and Python, what's the deal?
The Play framework makes it easier to build Web applications with Java and Scala. Could be Jython with the right modules. Lunatech has contributed to this. It is inspired by Django and Ruby on Rails. The scripting aspect is done in Python. We try to be language agnostic. You don't need to restart the server when you change your Java code; this is done on the fly. There is a community around it that develops extra modules. See http://www.playframework.org/
Sylvain Viollon (Infrae): Demo of infrae.testbrowser
With infrae.testbrowser you can test web applications that run on WSGI. You can ask the test browser to look for content with xpath expressions or css selection. So you can more easily make sure you don't get an error when there are two inputs with the same name. It can also talk to Selenium and execute the tests in a real browser.
Jan-Jaap Driessen: Hackathon
We did a hackathon/sprint in Rotterdam two weeks ago. About ten people showed up. Was a good day. I can set up a new meeting but others can do that too. I am planning one in a week about deployment. Keep an eye on the mailing list of PUN. Let's learn from one another and exchange ideas and code together.
Beer and pizza at the PUN [Photo by Jasper Spaans]
Summaries Python Users Netherlands meeting in Utrecht
Meeting of the Python Users Netherlands group. This evening was organised by Nelen & Schuurmans in Utrecht on 16 February 2011.
My brother Reinout (who was the main organiser) also has summaries.
Christo Butcher (Fox IT): Trac with python
Sorry, I came in late and missed the first part of this talk and was eating during the second part. :-)
Maurits van Rees (Zest Software): Theming with xdv, Paste and fanstatic
Since a few days I use xdv, Paste and fanstatic to put a new theme around my old Plone 2.5 website. The code is available on github and can be instructive as an example of these technologies: https://github.com/mauritsvanrees/maurits-site-xdv
See the slides.
Jan-Wijbrand Kolman (The Health Agency): Publish your changes
I do a lot of releases, especially for Grok and the Zope Toolkit. So I look a lot at the Python Package Index (PyPI) to see which changes have been made in packages.
Quote: "Zope Community and Friends: I love you so much for putting changelogs in your PyPI info." (jshell)
Compare e.g. Sphinx and grokcore.component on PyPI. Sphinx hardly says anything on its PyPI page. grokcore.component is much smaller but publishes a lot more info right there on PyPI, especially its changelog. Add that to your packages and give them a meaningful package description. It is probably the first page someone sees for most packages, so you had better leave a good first impression.
Jasper Spaans (Fox IT): pypy rocks, even in the real world
(Note: pypy is not PyPI.) Pypy puts a python in your pythons. It is a python interpreter implemented in RPython, restricted python. You can do memory management also in RPython, so you don't need to handle, understand and write C code if you need your own specific memory management. It should be a drop-in replacement for normal python (CPython) in most cases. It offers a possibility for a JIT optimizer, which can make your code faster.
We tried pypy a few weeks ago. We encountered some SyntaxErrors, as version 1.4.1 only supports python 2.5). Class decorators are not supported yet (so we decorated manually). The with-statement needs to be imported from __future__. Memory usage exploded (fixed by changing to the hybrid garbage collection). SQLAlchemy was too hairy; for that part we put in a bit of standard python to handle the SQLAlchemy integration, together with protobuf.
Several of our performance tests showed improvements of 7 to 13 percent when using pypy. The memory usage was 20 percent higher though.
Reinout van Rees (Nelen & Schuurmans): Geographic information websites for water management
Our customers are mostly government agencies working in water management. They have a lot of information about water. We make that info available in a format that is easier for them, on a website. Customers can click through maps in our application to get info. What technology do we use? Lots of python!
- Mapnik: uses the rendering engine from Openstreetmap. You have points, lines and grids. It understands the WMS standard for web mapping. It gives a .png file as output.
- Gdal: used behind the scenes mostly, for grids.
- Matplotlib: a graph library. It can do everything. Others are easier, but wit limitations. Several customes have too many peculiar wishes that are difficult to do with other packages, if it is possible at all. (Remark from the audience: use protovis.)
- Pyproj. You have various measurements for where on the globe you are: rijksdriehoek (in the Netherlands), degrees, Google mercator. Pyproj translates between these.
- On the client side we use jquery, openlayers, and the blueprint css framework.
We have a layered software structure:
- core: splitting everything up.
- lizard-ui: defines the html page structure.
- lizard-map: this is the big python/django application that we have, the core of the map. Map visualisation; basic graph and search handling; basic popups; plugin mechanism for adapters (using extra attributes in html5) with which you can search, get a layer and show the corresponding html in a popup.
Organisation:
- We have collective code ownership. Everyone can touch all the code. For most of the code everyone has a basic understanding of what is going on. We want code that conforms to the pep8 and pyflakes standards. For automatically running the tests we use Jenkins (formerly known as Hudson). We borrow each others brains: sit with each other to check ideas that you have for sanity.
- Documentation is generated by Sphinx.
Our clients are mostly governmental and they like it when they use open source, so we have open sourced our code, also because it uses a lot of open source itself. This also attracts me to the company.
See http://doc.lizardsystem.nl/ and http://reinout.vanrees.org/weblog/
We want you! Business is booming! See http://www.nelen-schuurmans.nl/
Coen Nengerman (Nelen & Schuurmans): ArcGIS
I studied hydrology; I am more a python user, not a developer. ArcGIS is a professional tool to manage your geographical information. It makes lots of info and tools available in a nice UI. Catalog, tools, python scripts using e.g. scipy and numpy, including an integrated python command line. It can find maps online as well, like buienradar or maps of the flooding in Pakistan.
Jan-Jaap Driessen (The Health Agency): fanstatic
Fanstatic is a python package for resource management: css and javascript files. I am using zope and grok. Getting resources from zope can be slow, because lots of checks are done, for example authentication and authorization, which is not always needed. Usually you include too much resources on all pages, because you do not know for sure which of them you need for a specific page. Fanstatic fixes this for you. The fanstatic middleware will figure out which resources you really need and serve them to you.
You could download a jquery package from somewhere and put it in your own code, but we want to create python packages for this so you can easy_install them (js.jquery, js.yui). There is no real management of dependencies in the javascript world currently like there is for python, so for now we create python packages that define these dependencies.
Fanstatic minimizes http requests by doing rollup, bundling, using a cdn (content delivery network that hosts common javascript resources) if available, and versioning a resource so it can be cached for ten years in a browser without getting stale: when you change the resource a version with a new id is requested in the html. It takes hints from Steve Souders at http://developer.yahoo.com/performance/rules.html
On the roadmap (some already on branches): bundling, cdn, compile for minification (google closure compiler), lazy loading (google loader, requirejs), commonjs.
Nico de Groot (Tilburg School of Theology): web2py
web2py is easy to learn, stable, secure, light weight, faster than zope ;-), can use different kinds of databases, can run inside Google Apps, some mercurial integration.
At the top you have applications consisting of plugins. In the middle there is gluon (old name of web2py) and general libraries like simplejson, and at the bottom the rocket server.
Jean-Paul Ladage (Zest Software): Prettig Personeel
Prettig Personeel is an online human resource management system created by Zest Software using Plone. You can generate contracts, get reminders that contracts are ending or for birthdays or for people who have been ill for a long time.
When you view an employee in this web application there are lots of boxes with info in which you can view, add and edit contracts, etc. We developed jquery.pyproxy to handle this. With that, you can use jquery within python (works in Plone and Django). You can use all manipulation and effects of the jQuery API. The result is returned as an xml response for an AJAX call so it gets processed on the client side in the browser.
It's on pypi and github:
Quickly change nginx configs
Shell script to change nginx configs to ignore logins and cookies.
See these pages for info on why you may want this: http://plone.org/products/plone/security/advisories/cve-2011-0720 and http://plone.org/documentation/kb/disable-logins-for-a-plone-site
If you want to change lots of nginx config files to temporarily switch off login (authentication) and cookies, you can use this bash script at your own risk:
#! /bin/bash # Note: /bin/sh would be better, but at least when that points to # /bin/dash it complains about some of my usage of 'test'. cat < $CONFFILE fi fi fi done echo "Do not forget to reload or restart nginx after changes."