We already know how to control code quality by writing automated tests. We also know how to ensure that the code quality doesn't go down by using a tool to measure code covered by tests and fail the build automatically when it goes under a given threshold (and it seems to be working).
Wouldn't it be nice to be also able to verify the quality of the tests themselves?
I'm proposing the following strategy for this:
- Integrate PIT/Descartes in your Maven build
- PIT/Descartes generates a Mutation Score metric. So the idea is to monitor this metric and ensure that it keeps going in the right direction and doesn't go down. Similar than watching the Clover TPC metric and ensuring it always go up.
- Thus the idea would be, for each Maven module to set up a Mutation Score threshold (you'd run it once to get the current value and set that value as the initial threshold) and have the PIT/Descartes Maven plugin fail the build if the computed mutation score is below this threshold. In effect this would tell that the last changes have introduced tests that are of lowering quality than existing tests (in average) and that the new tests need to be improved to the level of the others.
In order for this strategy to be implementable we need PIT/Descartes to implement the following enhancements requests first:
- Threshold check to prevent build in Maven
- Handle multimodule projects
- Improve efficiency. Even though this one is very important so that developers can run it as part of the build locally before pushing their commits, the PIT/Descartes Maven plugin could be executed on the CI. But even for that to be possible, I believe that the execution speed needs to be improved substantially.
I'm eagerly waiting for this issues to be fixed in order to try this strategy on the XWiki project and verify it can work in practice. There are some reason why it couldn't work such as being too painful and not being easy enough to identify test problems and fix them.
WDYT? Do you see this as possibly working?