Armin Ronacher - A year with MongoDB

published May 10, 2013

Armin Ronacher talks about MongoDB, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I do computers, currently at Fireteam. We do the internet for pointy-shooty games.

I started out hating MongoDB a year ago. Then it started making sense after a while, but I ended up not liking it much. But: MongoDB is a pretty okay data store, as Jared Heftly says. We are just really good at finding corner cases.

MongoDB is like a nuclear reactor, but if you use it well, it is safe. I said that in October. Currently I am less enthousiastic.

We had a game. Christmas came around. Server load went up a lot.

Why did we pick MongoDB initially? It is schemaless, the database is sharded automatically, it is sessionless. But schemaless is just wrong, MongoDB's sharding is annoying, thinking in records is hard, and there is a reason people use sessions.

MongoDB has several parts: mongod, mongoc, mongos. So it has many moving parts.

First we failed ourselves. We were on Amazon, which is not good for databases. Mongos and Mongod were split but on the same server, which meant that they were constantly waiting on each other. We went to two cores and then it was fine. Still, EBS (Elastic Block Storage) is not good for IO, so not good for databases. Try writing a lot of data for a minute, just with dd and you will see what I mean.

MongoDB has no transactions. You can work around this, but we really did need it. It is meant for Document-level operations, storing documents within documents, but that did not really work for us. Your mileage may vary.

MongoDB is stateful. It assumes that the data is saved correctly. If you want to be sure, you need to ask it explicitly.

It crashes a lot. We did not update from 2.0 for a while because we would have hit lots of segfaults.

To break your cluster: add new primary, remove old primary, don't shutdown old primary (this step is bad!), network partitions and one of them overrides the config of the other in the mongoc. That happened to us during Christmas.

Schema versus schemaless is like static typing versus dynamic typing. Ever since C# and TypeScript, static typing with an escape hatch to dynamic typing wins. I think almost everyone adds schemas to MongoDB. It is what we do anyway.

getLastError() is just disappointing. Because you have to ask this all the time, things are always slower.

There is a lack of joins. This is called a 'feature'. I see people joining in their code by hand. The database should be much better at doing this than the average user. MongoDB does not have Map-Reduce, except a version that hardly counts.

When using the find or aggregate functions in the API to get records, you can basically get SQL injection when a user makes sure to get a dollar sign at the beginning of a string, as MongoDB handles that differently.

Even MySQL supports MVCC, so transactions. MongoDB: no.

MongoDB can only use one index per query, so quite limited. Negations never use indexes; not too unreasonable, but very annoying. There is a query optimizer though.

Making MongoDB far less slow on OS X:

mongod --noprealloc --smallfiles --nojournal run

Do not use : or | in your collection names, or it will not work if you try to import it on Windows.

A third of the data is the key. That is just insane. A reason to use schemas.

A MongoDB cluster needs to boot in a certain order.

MongoDB is a pretty good data dump thing. It is not a SQL database, but you probably want a SQL database, at least until RethinkDB is ready. Probably we would have had similar problems with RethinkDB though.

It is improving. There is a lot of backing from really big companies.

I don't want to do this again. I want to use Postgres. If I ever get data that is so large that Postgres cannot handle it, I have apparently done something successful and I will start doing something else. Postgres already has solved so many problems at the database level so you do not have to come up with solutions yourself at a higher level.

Without a doubt, MongoDB will get better and be the database of choice for some problems.

The project we use it for does still run on MongoDB and that will probably remain that way.

Oleg Pidsadnyi - Behaviour driven design with PyTest

published May 10, 2013

Oleg Pidsadnyi talks about behaviour driven design with PyTest, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

Code of pytest-bdd: https://github.com/olegpidsadnyi/pytest-bdd

I will talk about behaviour driven development in Python. What is it? You define a scenario with strict language, like given this and that, expect this. It is readable for both programmers and business logic persons.

There are several ways to do this. You can use Lettuce or Freshen, plus Splinter. But imperative coding style does not work well here, I think. Given I have two books. How do you do that? context.books = [...]? context.book1, context.book2? pytest-bdd has the concept of expecting and returning values, with @pytest.fixture:

@pytest.fixture
def author():
    return Author()

@pytest.fixture
def book(author):
    return Book(author=author)

@given('I have two books')
def article(author):
    return [book(author=author), book(author=author)]

@given('I have an article')
def article(author):
    return create_test_article(author=author)

This is much more explicit.

It can do browser tests. You can use a normal browser, like Firefox, or you can set it up headless with phantomjs.

pytest-bdd is inspired by the Robot Framework, but we had some different requirements.

Douwe van der Meij and Brandon Tilstra - MVC revisited with Diazo

published May 10, 2013

Douwe van der Meij and Brandon Tilstra talk about MVC revisited with Diazo, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

We work for Goldmund, Wyldebeast and Wunderliebe, the sponsors of this conference with the longest name.

We use Diazo for theming. Your application creates an html page. Your designer creates a standard html template including css. With Diazo you merge the two. It separates the content creation from the styling.

The technique is plain old XSL transformations. Nothing new.

Why should you use it? Designers and developers look at a certain project in a different way. The designer sees the beautiful outside of the car and the developer sees the gritty details under the hood. Developers usually want to stick to developing features in a minimal design, just some standard html preferably without any css.

If you look at the Model-View-Controller paradigm with Diazo in mind, it makes the View part easier: the developer handles the application part and the designer handles the styling part. The designer does not need to know Django or Plone templates.

Brandon is busy making Diazo available for other applications than just Plone. For Plone a tool is available: plone.app.theming. Since Plone 4.3 you can edit the theme inside Plone. A designer can do that, with a WYSIWYG editor in Plone and a developer can tweak the code with a code editor in Plone.

Brandon is working on thememapper, a standalone theme editor, written in Python. thememapper.core is the tool itself. thememapper.diazo is a diazo server to be used with thememapper.core.

Question: Aren't you trying to solve an organizational problem? Shouldn't the designer and developer be talking to each other?

Answer: We often get a design in Photoshop, give it to a third party front-end party that creates html and css from it, and we as developers create the Diazo rules.

Alessandro Molina - High Performance Web Applications with Python and TurboGears

published May 10, 2013

Alessandro Molina talks about High Performance Web Applications with Python and TurboGears, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

I am @__amol__ on Twitter. I am a member of the TurboGears team. I will talk about general rules which apply to any framework, some quick wins for TurboGears, some real cases, my personal experiences and preferences, feel free to disagree.

People seem obsessed with the raw speed of web servers. But you are not going to serve a "Hello world" page. My personal stack has nginx plus mod_wsgi or nginx plus Circus-Chaussette-gevent.

Try to avoid making everything an asynchronous (AJAX) request. Browsers have limited concurrency. HTPP has overhead. You will actually slow things down if you have too much of them. Your page may start fast but complete slow. Learn your framework and how you can use it the best way for your use case.

TurboGears is a framework for rapid development, encouraging flexibility. It was created in 2005, with 2.0 a major rewrite in 2009 to embrace the WSGI standard. It is based on object dispatching. You can use regular expressions for url matching, but they can get messy, so write them only when you must. By default an XML template engine with error detection. Declarative models with a transactional unit of work. It has built-in validation, authentication, authorization, caching, sessions, migration, etcetera.

Features versus speed: TurboGears is a full-stack framework. That makes it quite slow by default. You can switch things off that you do not need. The team invested effort to constantly speed it up since the 2.1 release. Keeping all the features around has its price, obviously. To cope with this, a minimal mode got introduced, that switches several things off that have the biggest influence on performance. You go from about 900 to 2100 requests per second.

Avoid serving static files in your framework. Let some other part of your stack handle this. This can take a lot of load from your application server.

Use caching. Caching means preorganizing your data the way you are going to use it. For example with a NoSQL database you can load the comments directly when accessing the page, so you don't need to load them separately. Frameworks usually provide various types of caching. Get to know them and use them.

Use HTML5 and Javascript. Invalidating your whole cache just to add a message "Welcome back, mister X" is not a good idea. Cache the result and use Javascript to do minor changes. If you are using varnish, nginx or any other frontend cache, consider using Javascript plus localstorage instead of cookies for trivial customizations, because cookies disrupt the cache.

Cache the result of rendering a template, with a cache key.

Entity caching: cache parts of your page, for example the html of one comment or notification.

Proactively update the page in your cache: when you edit the page, update the cache before your visitors ask for it.

If you are struggling too much with improving performance, you are probably doing something your application is not meant to do. Also, football fans are really eager for updates.

Offload work. Update the core cache to provide the author with an immediate feedback. Let some other process, program or thread handle the related background changes. For example, use something like Celery.

New Relic App Speed Index reports an average of 5.0 seconds of response time for accepted experience: http://newrelic.com/ASI If response time is less than 200 milliseconds, this is seen as 'right now'.

Sorry, there is no silver bullet for speeding up your application.

Kenneth Reitz - Python for humans

published May 10, 2013

Kenneth Reitz talks about Python for humans, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

Follow me on twitter: @kennethreitz_. I work at Heroku. I am a member of the Python software foundation. I created the requests library that makes is easier for humans to retrieve a web page in Python.

I want everything I build to be open source. I build for open source, pretend it is open source even when it will never be released to the public because it is too client specific. Add documentation, please, also to your internal tools.

Open your Python prompt and see the Zen of Python:

>>> import this

Beautiful is better than ugly: you don't need lots of curly braces in Python. Explicit is better than implicit.

Most importantly for today: there should be one - and preferably only one - obvious way to do it.

Welcome to paradise. You have found Python, a language with a beautiful zen philosophy and all is going to be fine. Lies!

Look at some Ruby code to download the contents of an https web page. Perhaps a bit too many lines, but it is pretty straight forward. Now you switch to Python. Which library? urllib, urllib2, something else? Okay, you use urllib2 and then you have to add some password manager and lots more lines that are unclear. You will leave and never come back.

This is a serious problem. I think HTTP should be as simple as a print statement. It is used so often! Our world is connected over HTTP right now so we need this to be simpler.

Python needs more Pragmatic Packages. Deal with things sensibly and realistically, in a way that is based on practical rather than theoretical considerations. Make it practical for humans, for the common case.

The requests library is HTTP for humans. A small set of methods with consistent parameters. Do this for more modules! Fit the ninety percent use case. Features, efficiency, performance, etcetera are important, but you should ignore them for such a practical package. Go for the common case. Write the README the way you think it is supposed to work.

"Cool story, bro, but why should I do with this?" It's worth your time and everyone else's time as a developer.

Some things that could be improved:

File and system operations. We have the modules, sys, shutils, os, os.path, io. Which should you use for a task? It is really difficult to run external commands and this blocks dev+ops folks from using Python.
For installing Python there are various ways. Use the Python that came with the system or compile your own? Python 2 or 3?
XML hell. etree annoys people. lxml is awesome, but can be difficult to install.
Packaging and dependencies. Pip or easy_install? How about an easy_uninstall? Distribute or setuptools? [Holger Krekel will talk about this topic later today in the keynote.]
Dates, datetimes. Which module: datetime, date, time, calendar, dateutil, version 1.5? Timezones, they are ridiculous.
Unicode. We'll just skip that one.
Testing. Lots of different ways to create tests. Doctests and unit tests. Various test runner libraries, like nose and tox.
Installing dependencies. Python-mysql, if you remember the exact name. (My solution: just use Postgres.) Python Imaging Library needs several system packages before you can install it. mod_wsgi: if you install this, which Python version are you using, the one from the system?

Hitchhiker's guide to Python: http://python-guide.org The goal of this project is to write down the tribal logic, documenting best practices. A guide book for newcomers. A reference manual for seasoned pros. It tries to document the one - and preferably only one - obvious way to do it. Please contribute documentation here. This lets of practice what we preach.

Lets fix our APIs and improve our documentation.

requests might get into core at some point.

requests started as a wrapper around urllib2. It was a nightmare. It now uses http to handle the lower level parts.

If I would have said 'yes' to all feature proposals for requests, it would have been too much. People can write a library around requests. Saying 'no' made it possible to create and maintain a good architecture.

http://pypi.python.org/pypi/requests