With the rising awareness of the importance of QA for software development, traditional methodologies are being challenged and new testing methods are being adopted by teams all around the world. We are shifting left, automating, using functional and non-functional testing more often, even adopting continuous testing.
Today, I’m going to introduce you to mutation testing. Understanding the topic requires a lot of theory so for this article we will start with the basics. We will also look at a practical example so that you can have a better understanding of what is happening when we use this technique.
You need a minimum understanding of programming, or at least a little bit of imagination to get the idea, but I promise this is also accessible to less tech-savvy readers!
To start talking about mutation testing we need some context, so let’s discuss unit testing and its importance in software development first.
Unit testing has always been a controversial subject. We all have very different scenarios and opinions regarding them, starting with ‘we don’t have time for that’, all the way through to ‘we will unit test when we finish coding’ until ‘that adds no value’. And for those who need a little reminder of what unit tests are, here it is:
These are automated tests that ensure that a single part of an application (such as the methods or functions) meets its functionality or purpose. Unit tests are typically conducted by developers and are the first tests done to ensure the individual unit of code is working a fit for use.
In this context, let’s also remember the test pyramid which is pretty self-explanatory:
When we talk about unit testing, coverage is also an important concept. Right now, in most projects, we measure the success of unit tests by coverage. This is no more and no less than the number of lines of code our tests pass through, the goal being often 70-80%. We will discuss why it is or it isn’t enough later on.
At this point, we should have a very basic understanding of the unit testing concept, and we should all agree that they are needed (or if we don’t, let’s at least pretend we do).
So what’s mutation testing? Well, in essence, it’s just testing your unit tests. However, it’s a lot more than that. Let’s see how it works:
Mutation testing consists of making changes or mutations to our code, then passing all our unit tests to those mutated copies. Sounds hard, right? Well, it actually isn’t!. There are a lot of libraries and plugins built for this purpose, and most of them are easy to use.
To understand what mutation testing does, imagine having a lot of different versions of your code, and in those versions, true becomes false, numbers are increased or decreased randomly, or substituted by zero, or entire lines of code are removed.
Would our unit tests be able to find those mutations? What would it mean if they don’t? And what if they do? In an ideal scenario, all our mutations would be discovered by unit tests. In mutation testing slang (yes, it already has a slang), this would mean that those mutants are killed. Sounds fun, right? Now we are not just testing, but also killing mutants!
What happens if our tests don’t detect the mutants? There are two possible explanations in this case. First, the mutants that we are using for our code are not a good fit (there are a lot of mutants, some are pretty solid, others are experimental and don’t actually fit that well). Second, our tests can be improved. Yes, I know you were expecting me to say it means our tests are bad, but no test is bad if it’s really trying to test something, it’s just that it needs some improvement.
Once we have mutated our code, seen how many mutants we killed and how many survive we will obtain our mutation coverage. This differs from line coverage in that it does not measure the lines of code tested, but the number of mutants killed compared to the total number of mutants we created.
Now we have two things: line coverage and mutation coverage. Which one is more important, is it better to have more line coverage even if we don’t have a good mutation coverage? Or is it the other way around? Well, you can guess the answer we need both!
Enough theory! Let’s looks at some code, some examples of HTML coverage reports, how these can be misleading and some mutation testing in action of course. For this example, we will be using Java, JUnit 4, JaCoCo and Pitest.
Let’s kill some mutants
We have a very simple code that simulates a few validations with a user.
This is a simple class and method. We pass a user, it checks the email and checks the phone and either returns the user or changes its phone to a predefined number and then returns the user anyway. In real life, this would be much more complex, but for the sake of the experiment let’s go with this.
Now, without showing what tests I have, here’s the coverage I have obtained for this class:
This is a JaCoCo code coverage HTML report. Yes, 100% line coverage! Sounds good, right? This is the dream for any QA engineer’s project or for any manager. Check how good and well covered our code is – or… is it really?
Let’s take a look at another report, this time from Pitest. This will show us both line coverage and mutation coverage:
What’s happening here: 100% line coverage but only 4% mutation coverage? Let’s take a look at our tests.
Hold on, what’s going on? No assertions, just two tests? Although we had 100% line coverage! Well, of course, these two ‘tests’ cover the entirety of our method, one positive case, one negative case and there we go, 100%. However, is this good testing? Of course, it isn’t. Are we likely to find this in any real-life project though?
Actually… yes, we are, and there might be a lot of reasons why this can happen. You can’t release code that has less than 70-80% coverage, developers aren’t familiar with unit testing, they have no time to test so they have to rush it, they want the manual functional tester to do the unit testing or a long list of other reasons.
So, how serious is this? How bad? Can we fix it? Of course, we can! Let’s take a more in-depth look at the Pitest report and find where those evil mutants are and how to defeat them.
We can see our code above. The green background is showing our line coverage and the red elements show the surviving mutants. In the red circle, we can see how many of them are left. Let’s check what happens when we introduce a wrong phone number (we are just checking that it has 9 numbers).
In line 12, a few things happen. We have several cases: getting the number and adding plus 1, making it negative, replacing it with a 0, 1 or -1 or removing the setPhone() method call. Finally, we see eight mutants in just one line of code… and all of them are alive!
Do you want to try and fix this? We know that if we introduce an incorrect phone number per code, it will be set to a predefined one (910000000). How about fixing our negative case test and assert that we have that number in the end?
Let’s see what happens after adding a simple assertion to this test:
We have gone from 4% to 54% mutation coverage! Let’s take a look inside and see how many mutants we killed in line 12: