Comparison of Python mutation testing modules

When I discovered the existence of Mutation Testing, it was a revelation. It showed me an almost effortless way to “test my tests” with an absolutely clever way. Recently I did a lot of coding in Python and when I looked for a mutation testing framework for that language I found many results, but no comparison. So I did my own 🙂

The following table is a very subjective and superficial evaluation, without investing extensive digging into each tool’s configuration options. I just took the most straightfoward way to run it on a current company project.

ModuleActively maintained?Ease of useRaw outputEvaluation

run & generate html
“Killed 640 out of 954 mutants”

HTML file
314 non killed mutants of which 280 were in a config.py file, so can be ignored.
MutPynoneeded local fix

all: 146
killed: 0 (0.0%)
survived: 144 (98.6%)
incompetent: 2 (1.4%)
timeout: 0 (0.0%)
The run is very fast, so I have the feeling that the unit tests for some reasons are not really executed (even though they are listed on startup)
mutatestnosimpleSURVIVED: 5
RUN DATETIME: 2021-10-10 13:46:20.181892
It seems to run only a random set of mutations
Cosmic Rayyescomplex

create config file, init, baseline, execute, generate html
HTML file
199 non-killed mutants

No summary/expand for each file, only long list of findings

From the above, the maintained mutmut and cosmic ray found the most surviving mutants in my example project, while both other tools found much less. MutPy left a dubious feeling if it actually worked as intended. Both MutPy and mutatest are not maintained anymore. In the future I will look closer into mutmut and cosmic ray and see if I can tweak them towards actually using the results, as the initial ones are still a bit “too raw” to directly act upon (which lies in the nature of my test project, too).