Armin Ronacher - A year with MongoDB
Armin Ronacher talks about MongoDB, at PyGrunn.
See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.
I do computers, currently at Fireteam. We do the internet for pointy-shooty games.
I started out hating MongoDB a year ago. Then it started making sense after a while, but I ended up not liking it much. But: MongoDB is a pretty okay data store, as Jared Heftly says. We are just really good at finding corner cases.
MongoDB is like a nuclear reactor, but if you use it well, it is safe. I said that in October. Currently I am less enthousiastic.
We had a game. Christmas came around. Server load went up a lot.
Why did we pick MongoDB initially? It is schemaless, the database is sharded automatically, it is sessionless. But schemaless is just wrong, MongoDB's sharding is annoying, thinking in records is hard, and there is a reason people use sessions.
MongoDB has several parts: mongod, mongoc, mongos. So it has many moving parts.
First we failed ourselves. We were on Amazon, which is not good for databases. Mongos and Mongod were split but on the same server, which meant that they were constantly waiting on each other. We went to two cores and then it was fine. Still, EBS (Elastic Block Storage) is not good for IO, so not good for databases. Try writing a lot of data for a minute, just with dd and you will see what I mean.
MongoDB has no transactions. You can work around this, but we really did need it. It is meant for Document-level operations, storing documents within documents, but that did not really work for us. Your mileage may vary.
MongoDB is stateful. It assumes that the data is saved correctly. If you want to be sure, you need to ask it explicitly.
It crashes a lot. We did not update from 2.0 for a while because we would have hit lots of segfaults.
To break your cluster: add new primary, remove old primary, don't shutdown old primary (this step is bad!), network partitions and one of them overrides the config of the other in the mongoc. That happened to us during Christmas.
Schema versus schemaless is like static typing versus dynamic typing. Ever since C# and TypeScript, static typing with an escape hatch to dynamic typing wins. I think almost everyone adds schemas to MongoDB. It is what we do anyway.
getLastError() is just disappointing. Because you have to ask this all the time, things are always slower.
There is a lack of joins. This is called a 'feature'. I see people joining in their code by hand. The database should be much better at doing this than the average user. MongoDB does not have Map-Reduce, except a version that hardly counts.
When using the find or aggregate functions in the API to get records, you can basically get SQL injection when a user makes sure to get a dollar sign at the beginning of a string, as MongoDB handles that differently.
Even MySQL supports MVCC, so transactions. MongoDB: no.
MongoDB can only use one index per query, so quite limited. Negations never use indexes; not too unreasonable, but very annoying. There is a query optimizer though.
Making MongoDB far less slow on OS X:
mongod --noprealloc --smallfiles --nojournal run
Do not use : or | in your collection names, or it will not work if you try to import it on Windows.
A third of the data is the key. That is just insane. A reason to use schemas.
A MongoDB cluster needs to boot in a certain order.
MongoDB is a pretty good data dump thing. It is not a SQL database, but you probably want a SQL database, at least until RethinkDB is ready. Probably we would have had similar problems with RethinkDB though.
It is improving. There is a lot of backing from really big companies.
I don't want to do this again. I want to use Postgres. If I ever get data that is so large that Postgres cannot handle it, I have apparently done something successful and I will start doing something else. Postgres already has solved so many problems at the database level so you do not have to come up with solutions yourself at a higher level.
Without a doubt, MongoDB will get better and be the database of choice for some problems.
The project we use it for does still run on MongoDB and that will probably remain that way.
