Jonathan Barnoud - Looking at molecules using Python

published May 19, 2017

Jonathan Barnoud talks about looking at molecules using Python, at PyGrunn.

See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.

In this presentation, I will demonstrate how Python can be used throughout the workflow of molecular dynamic simulations. We will see how Python can be used to set up simulations, and how we can visualize simulations in a Jupyter Notebook with NGLView. We will also see the MDAnalysis library to write analysis tools, and datreant to organize the data.

I work at the University of Groningen. I look at fat and proteins, at the level of molecules and atoms. We can simulate them using molecular dynamics. Force is equal to the mass times the accelleration (F = m*a). We need initial positions and initial velocities.

My workflow: prepare system, run a simulation, visualise and analyse in Jupyter notebook, which may need several loops through this system, and then I can write a report.

Preparing a simulation: topology, what are the initial coordinates, what are simulation parameters. I use some bash and python scripts to prepare those text files. These go into the simulation engine, which gives as output a trajectory: how will all those molecules move.

There are lots of simulation engines, which need different file formats as input, and give different output formats. So I use Python to create a library that abstracts these differences away.

One of these engines is MD Analysis. The main object is a universe, with a topology and trajectory. The universe is full of atoms. Each atom has attributes attached to it, like name, position, mass. Everything is in arrays. You can select atoms: universe.select_atoms('not resname SOL'). Sample code:

for time_step in universe.trajectory[:10]:
    print(universe.atoms[0].position)

nglview can show an analysis from MD analysis (or other engines) by using a javascript library, to visualise it.

Now you may end up with lots of simulation data in lots of directories and files. Your filesystem is now a mess! So we use datreant. (Treant was a talking tree in Dungeons and Dragons.) This helps you to discover where the outcome of which simulation is. And access the data from it.

To conclude:

  • Python is awesome.
  • Jupyter is awesome too. [See also the talk about a billion stars earlier today.]
  • The Python science stack is awesome as well.
  • Each field develops awesome tools based on the above.