Luuk van der Velden - Best practices for the lone coder syndrome
Luuk van der Velden talks about best practices for the lone coder syndrome, at PyGrunn.
See the PyGrunn website for more info about this one-day Python conference in Groningen, The Netherlands.
I do a PhD at the Center for Neuroscience, University of Amsterdam. I switched from Matlab to Python a few years ago. I am a passionate and critical programmer.
Programming is not a substantial part of most science educations, apart from obvious studies like computer science. A lot of experiments in sciences generate more and more data. The demand on computer power and data analysis is becoming bigger.
A PhD student, which we take as example of a lone coder, is responsible for his own project. He or she does the work himself: experiments, analysis. Collaborations do happen, but are asymmetric. I can talk to others, but they usually do not program together with me. Or they pass me some Matlab code that I then have to translate into Python.
A PhD will take about four years, so your code needs to keep running for all that time, maybe longer. Development is continuous.
Cutting corners when working on your own is attractive. You are the only one who uses it, and it works, so why bother improving it for corner cases? High standards demands discipline. So you end up with duplicated code, unreadable code, no documentation, unstructured functionality with no eye for reuse, code rot.
We have a scripting pitfall. Scripting languages like Python are a flexible tool to link different data producing systems, process data and create summaries and figures. Pitfalls for common scripts are: data hiding, hiding of complexity, division of functionality (household and processing), lack of scalability, no handles for code reuse.
What a script for scientific analysis should do, is defining what you want, concisely.
Prototyping is essential for researching a solution. It is used continuously. Consolidation is very different from prototyping. Some things are better left as a prototype.
You should have a hard core of software that is tested well. In your scripts you use this, instead of copying an old full script. 'Soft' code sits between the hard core and the script, as an interface.
As a scientist you did not get educated as a programmer. So you should get educated. And as Python programmers we should educate them. Presently the emphasis is on getting work done, not on programming. Matlab is the default language. This was originally a stripped down version for teaching students, but everyone kept using it. Closed source software goes against scientific ethos.
Python offers a full featured scientific computing stack. Python scales with your skills. You can use imperative code, functional, object oriented or meta programming. Python is free, so you can use the latest version without needing to pay for an upgrade like with Matlab.
We can organize courses and workshops, for example Software Carpentry.