Oktober 2021 – Jakob Breu

Comparison of Python mutation testing modules

When I discovered the existence of Mutation Testing, it was a revelation. It showed me an almost effortless way to „test my tests“ with an absolutely clever way. Recently I did a lot of coding in Python and when I looked for a mutation testing framework for that language I found many results, but no comparison. So I did my own 🙂

The following table is a very subjective and superficial evaluation, without investing extensive digging into each tool’s configuration options. I just took the most straightfoward way to run it on a current company project.

Module	Actively maintained?	Ease of use	Raw output	Evaluation
mutmut	yes	ok run & generate html	„Killed 640 out of 954 mutants“ HTML file	314 non killed mutants of which 280 were in a config.py file, so can be ignored.
MutPy	no	needed local fix run	all: 146 killed: 0 (0.0%) survived: 144 (98.6%) incompetent: 2 (1.4%) timeout: 0 (0.0%)	The run is very fast, so I have the feeling that the unit tests for some reasons are not really executed (even though they are listed on startup)
mutatest	no	simple	SURVIVED: 5 DETECTED: 16 TOTAL RUNS: 21 RUN DATETIME: 2021-10-10 13:46:20.181892	It seems to run only a random set of mutations
Cosmic Ray	yes	complex create config file, init, baseline, execute, generate html	HTML file	199 non-killed mutants No summary/expand for each file, only long list of findings

From the above, the maintained mutmut and cosmic ray found the most surviving mutants in my example project, while both other tools found much less. MutPy left a dubious feeling if it actually worked as intended. Both MutPy and mutatest are not maintained anymore. In the future I will look closer into mutmut and cosmic ray and see if I can tweak them towards actually using the results, as the initial ones are still a bit „too raw“ to directly act upon (which lies in the nature of my test project, too).