A few months ago I read some book that told me a surprising thing about bad code: most of the bad code (code with too many smells) is born at the first commit of the code file (ie. when code while is created). I couldn’t believe this because I thought that most of the bad code is born gradually through several commits. The book did have a source for this and I bookmarked it immediately as “read later”. Unfortunately, I don’t remember what that book was, but the source article was: Michele Tufano et. al., When and Why Your Code Starts to Smell Bad. This blog post is kind of a summary of that article.
What makes that article credible is that it is a scientific article with real stats. It is not a blog post writing about “how I believe things are” or just a tweet that says “Most of the bad code is born at the first commit”. They have gone through 200 open-source Github repos and analyzed them with statistical tests. Their summary of the result is
Most of the smell instances are introduced when files are created.Michele Tufano et. al.
This means that in most of our code bad code isn’t born gradually: it is born at the first commit! We should put more effort when we are creating our code files and write good code from the very beginning.
The article mentioned that in almost all cases the median number of commits needed by smell to affect code components is zero. Unfortunately, they didn’t tell exact easy numbers about it. Even if most of the bad code is born in the first commit, it is still common that it is born after several maintenance activities.
Another interesting thing Tufano et. al. found was that if code smells aren’t introduced at the first commit, then it is more common that code will be smelly after first few commits:
When a smell is going to appear, its symptoms occur very fast, and not gradually.Michele Tufano et. al.
This finding highlights that bad code is usually born at the beginning of the code file, not gradually as we have believed.
Why Are Smells Introduced?
Tufano et. al. also studied another interesting thing: why are smells introduced? This is quite an interesting question and not so easy to find out in our daily workdays. One of the results wasn’t surprising:
Smells are generally introduced in the last month before issuing a deadline.Michele Tufano et. al.
This means that most of the bad code is born near release deadlines which makes sense. Practically we don’t write good code in a hurry.
Another interesting thing they found was that most of the bad codes are introduced by code owners (owner is the person who has written 75% or more of the code). This was a bit surprising for me as I thought that programmers not so familiar with the code (“newcomers” in the article) would write more smelly code than those who know the code better. Stats were quite clear about this: 85-97% of smelly code is introduced by code owner. Maybe owners are too blind to their code and thus write bad code easier than newcomers.
Newcomers are not necessarily responsible for introducing bad smells, while developers with high workloads and release pressure are more prone to introducing smell instances.Michele Tufano et. al.
Tufano et. al. found that most of the smell instances are introduced when files are created (ie. first commit of the code file). That was surprising for me because I thought (like many of us) that code smells are born gradually by maintenance.
“We will fix this later”, is something we have said and heard many times. How often that “later” comes? Based on my experience, it comes rarely, even close to never (let me know if you know some research about this). Maybe Tufano et. al. have shown the same thing in their paper: we write bad code at the very beginning because we think we will fix it later.
Don’t think anymore that you will fix your code later. That “later” never comes. Fix your code now! That way we will write good code from the beginning and save time. Good code (with unit tests) is always cheaper than “I will fix this later”-code (without unit tests).
- When and Why Your Code Starts to Smell Bad by Michele Tufano et. al., published in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering