Q&A with Adam Tornhill - or - Why I view Code as a Crime Scene
by Adam Tornhill, March 2015
The interview was done by my project editor, Fahmida Y. Rashid. I chose to publish it here as well for a reason; Over the past years I've delivered several presentations about this topic. One of the most common questions I get is how I came up with the idea. So, here it is - this is why I view Code as a Crime Scene!
Q&A with Adam Tornhill
How did you come up with the metaphor of the source code being a crime scene?
Well, I was in the middle of my psychology studies when I joined a course in forensics. At the same time, I was working full-time as a software developer fighting some scary large-scale legacy systems on a regular basis. The main challenge there is always to know which parts of the codebase really matter. Which parts of the code become productivity bottlenecks? Which parts are hard to maintain? Where will the bugs be?
As I got into forensics, I realized that crime investigators face similar open-ended, large-scale problems that we do. And modern forensic psychologists attack these problems with methods useful to us software developers too. I decided to explore this connection and find out how we can apply it to code.
What are some of the forensics concepts we will learn about in this book?
The eye-opener to me, and the technique we’ll use as a metaphor to reason about code, is geographical offender profiling. A geographical offender profile uses the spatial movement of criminals to identify their home bases. It works by calculating a probability surface and projecting it onto a real-word geography. So, I thought, what if we could do the same for software?
In our case the offender is code. So we learn techniques to identify patterns in the evolution of your code, how you’ve worked with it so far. That gives you the power to predict its future, to find the code that’s hard to evolve and prone to defects – our offenders!
It’s not only about complex code – complexity only matters if we need to deal with it. That’s why it’s important to identify the overlap between complicated code that we also have to work with often. It’s a simple technique that works surprisingly well in practice. Of course we’ll also support it with findings from empirical software research – what you learn is not just opinions but based on practices that have been shown to work on real-world projects.
Large-scale software development is also a social activity. That means it’s prone to the same social biases that we fall for in everyday life. So here we’ll look into some forensic cases gone wrong, learn from their mistakes, and apply our new knowledge to reason about teamwork, organizations, and software architectures.
I don’t have a background in psychology. Will I be able to follow along?
I’ve made sure to explain the concepts we meet. Psychology matters to us since our primary tool as developers isn’t the computer – it’s our brain – and psychology is about how we function. It’s about how we learn, solve problems, reason, and work with others. All these areas relate to our everyday development activities.
Tell me more about Code Maat.
The analysis techniques are based on version-control data. As such, you’ll learn to mine data from your source code repositories and find interesting patterns in the evolution of your code. Code Maat is just a tool to automate the boring parts of that process.
In fact, I open-sourced Code Maat as a quick-start to put the techniques you learn about in the book into practice. We’ll also use the source code of Code Maat for some case studies. The only reason for that is because it feels better to rip my own design decisions into shreds rather than criticizing the work of others where I don’t share the original context.
That said, we’ll investigate several other codebases as well so that we get a feel for how the different techniques complement each other. Out of all that, the tool itself is the least important part.
But wait, you are saying I don’t need to use Code Maat to work with this book. What other tools can I use instead?
I’m pretty sure that these techniques will become mainstream in a few years – the information we can mine from our source code repositories is just too useful to be ignored. When that happens, you’ll have several tools to chose from (both commercial and free).
But until that happens, I’d recommend that you tailor the tools to your specific needs. The algorithms aren’t that hard to implement and we cover them all in the book. In addition, it’s easy to build more elaborate tools on top of Code Maat. Code Maat generates CSV output that’s straightforward to post-process and visualize in any way you chose.
Finally, there are other good options. I know that Michael Feathers, who wrote the foreword to the book, has open-sourced the tool he uses to analyze Ruby code repositories. There’s also the Moose project, which provides an open platform to build your own custom analyses.