When I discovered the existence of Mutation Testing, it was a revelation. It showed me an almost effortless way to „test my tests“ with an absolutely clever way. Recently I did a lot of coding in Python and when I looked for a mutation testing framework for that language I found many results, but no comparison. So I did my own 🙂
The following table is a very subjective and superficial evaluation, without investing extensive digging into each tool’s configuration options. I just took the most straightfoward way to run it on a current company project.
Module | Actively maintained? | Ease of use | Raw output | Evaluation |
---|---|---|---|---|
mutmut | yes | ok run & generate html | „Killed 640 out of 954 mutants“ HTML file | 314 non killed mutants of which 280 were in a config.py file, so can be ignored. |
MutPy | no | needed local fix run | all: 146 killed: 0 (0.0%) survived: 144 (98.6%) incompetent: 2 (1.4%) timeout: 0 (0.0%) | The run is very fast, so I have the feeling that the unit tests for some reasons are not really executed (even though they are listed on startup) |
mutatest | no | simple | SURVIVED: 5 DETECTED: 16 TOTAL RUNS: 21 RUN DATETIME: 2021-10-10 13:46:20.181892 | It seems to run only a random set of mutations |
Cosmic Ray | yes | complex create config file, init, baseline, execute, generate html | HTML file | 199 non-killed mutants No summary/expand for each file, only long list of findings |
From the above, the maintained mutmut and cosmic ray found the most surviving mutants in my example project, while both other tools found much less. MutPy left a dubious feeling if it actually worked as intended. Both MutPy and mutatest are not maintained anymore. In the future I will look closer into mutmut and cosmic ray and see if I can tweak them towards actually using the results, as the initial ones are still a bit „too raw“ to directly act upon (which lies in the nature of my test project, too).