Kategorien
Coding

Mutation testing with infection

A few months ago I read about mutation testing for the first time. At that time I started my current pet project with a focus on proper testing, and the idea hooked me immediately. However, back then my overall testing state was not at a level that such advanced mechanisms would make sense. But now, during my christmas vacation, the time was right 🙂

What is mutation testing? In my own words: with mutation testing one can check the quality of unit tests for code, by automatically and intentionally mutating the code and comparing the unit test results with the original ones. The mutated code is called a „mutant“ and if your test cases recognize most of them your tests may be considered sufficient (a lot ifs and whens apply here).

For my php-based pet project, infection was the natural choice; I couldnt find any other mutation testing framework which is similarly well maintained and documented.

The initial setup, however, was not fully smooth, I had to overcome some issues in my environment:

First, I had some trouble to run my unit tests locally (so far I only executed them on my server and in CI, but I realized I can shorten my turnaround times significantly when I also enable them locally). As I – shame on me – only have Windows on my desktop, I had to setup php (with extensions such as xdebug), mariadb and Papercut (SMTP service) locally. I wanted to do this already for a while, so now I digged into it. Cost me some hours to sort everything out.

The second issue was more tricky. Some history: Due to the way how PHP and PHPUnit work (some say its even a bug in PHP?), outputting http headers via the header() function is conflicting with PHPUnit. A helpful answer on stackoverflow lists the options to solve this, and I chose the second option long time ago. The first option is not working for me, the third option didnt seem feasible either as I actually need the header output for my functionality. So I chose to run PHPUnit with the „–stderr“ parameter, worked for me. With infection however, this resulted in issues. Outputting all PHPUnit results on stderr made infection, which parses PHPUnit’s output for errors, fail. I almost got mad in resolving this „deadlock“, but then stumbled over a github issue in which some compatibility topics were discussed. As far as I understand, their issue was another one, but I tried the given workaround, which seperates the initial PHPUnit run from the mutated runs:

vendor/bin/phpunit --coverage-xml=build/logs/coverage-xml --log-junit=build/logs/junit.xml
vendor/bin/infection --skip-initial-tests --threads=$(nproc) --coverage=build/logs --show-mutations --no-interaction 

This gave me the first output of infection:

The metrics at the bottom looked quite good, according to the infection documentation those values are good-ish. The hint in the second-to-last line is also important to acknowledge: Not all slipped-through mutants are in fact problems. Anyhow, it also shows 26 not covered mutants, i.e. code mutations which were not uncovered by my existing test cases.

In the infection.log file you can see which of those there are. Example (redacted):

2) ...\TaskController.php:137    [M] NewObject

--- Original
+++ New
@@ @@
         if (empty($taskId)) {
-            return new Response('HTTP/1.1 500 Internal Server Error ');
+            new Response('HTTP/1.1 500 Internal Server Error ');
+            return null;
         }     

Seems my test cases are not checking the return-types sufficiently or my code is not further using the returned objects here. Interesting! As a next step I will investigate how I can improve my unit tests based on the mutation test results.

Kategorien
Coding

State of Pipelines

Just a brief update on my current pet project’s Jenkins pipelines setup. The Jenkins BlueOcean graph probably already covers it mostly:

I have parallelized some „Fast Tests“ – all of these take less than a second to run currently due to the small codebase. If any of them fails it doesnt make sense to proceed. Currently these contain

  • phpmd: „mess detector“ using the clean code ruleset to identify unclean code.
  • pdepend: Some code analytics I learned while reading „Clean Architecture“ (review will come to the blog soon I hope).
  • phan: Static Code Analysis
  • phpunit: Unit testing

After those I run API tests with schemathesis. This takes one minute (configurable) and sends random payload to my REST API as per the generated swagger file.

Last but not least I run some acceptance tests, as described in Adventures with PHPUnit, geckodriver and selenium.

For a while I also parallelized the latter two test activities in a „Slow Tests“ block, but I learned that the test execution was very unreliable due to the load the test environments put on my server. So for now they run sequential.

Today I finished some major clean code refactoring and I am happy to say that I could rework most of the code and the tests in place ensured that functionally nothing broke without me checking the actual application manually once! This is a major achievement for me personally, as all my former pet projects based on heavy manual testing on basically every change.

Kategorien
Coding

Adventures with PHPUnit, geckodriver and selenium

In my current pet project, I try to not only focus on the features itself, but also in the quality of the software and the development environment. Since I programmed code myself, the professional software development world has evolved quite a lot, and new approaches and tools have emerged and are de facto standards nowadays. While I know them as an observer, I barely used them myself. That I want to change.

Historically, my project’s setup already included a selenium test file, using geckodriver and Facebook’s php-webdriver library. However, the test code itself was a plain php file, executing a series of webdriver instructions and returning failures when some string comparisons did not yield the desired results:

<?php

namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;

$host = 'http://localhost:4444';

$driver = RemoteWebDriver::create($host);

$driver->get('https://example.com/foo.php');

$driver->findElement(WebDriverBy::id('title'))
       ->sendKeys('Acceptance test title'); 

...

if (strcmp($driver->findElement(WebDriverBy::id('title'))->getAttribute('value'), 'Acceptance test title')!=0) {
    echo "Just created task cannot be accessed!";
    $driver->quit();
    exit(1);
}

$driver->quit();

This code worked, and helped me to assure that my product didn’t break (to some degree) from a user perspective. However, it was ugly to maintain, didn’t produce any helpful test report and just didn’t feel right.

Failed Approach

After some research I found out about phpunit-selenium, a project combining the powers of phpunit, I guess the dominating unit testing framework in the php space with selenium. Interesting. However, from there I went down a slippery road to trial-and-error-land. I should have probably get cautious, when the only documentation I found was referring to phpunit 3.7 (its now at 9.2), and even the github project was only referring to phpunit 8.x versions. With some back-and-forth, I could identify a version which fitted together with phpunit in composer repos and sucessfully deployed:

{
    "require-dev": {
        "phpunit/phpunit": "9.2",
        "phpunit/phpunit-selenium": "dev-master"
    },
}

Apart from that, I had to install the selenium server, as so far I was directly connecting to the geckodriver. This led to the worst part of issues: For some reasons in 95% of cases, the test cases didnt execute properly, and it was somehow related to firefox crashes. I tried so many things:

  • Made sure both firefox and geckodriver had most recent versions (they had), same for selenium server.
  • Executed everything as non-root.
  • Tweaked permissions.
  • Increased log-levels of geckodriver, selenium server and even firefox (the latter didnt even create crash dumps).
  • I even started to investigate if my server’s ubuntu has some limiting settings in systemd cgroups, somehow limiting the number of parallel threads and memory. After all, its a browser we are launching here.

All of the above didnt really change anything. It was really frustrating, and the whole chain was quite long to debug in my spare time. After approx. 10 hours of frustrated tests, I gave up and made up my mind.

Solution

Luckily, I found another approach, based on the same tooling I had initially already working, but in a cleaner way. How does this look like? I simply combined phpunit and webdriver calls without any third party addition. Code says more than thousand words:

<?php declare(strict_types=1);

namespace Facebook\WebDriver;

use PHPUnit\Framework\TestCase;
use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Exception\UnknownErrorException;

final class AcceptanceTest extends TestCase
{
    protected static $driver;

    public static function setUpBeforeClass(): void 
    {
        $host = 'http://localhost:4445';

        $capabilities = DesiredCapabilities::firefox();
        $capabilities->setCapability('moz:firefoxOptions', ['args' => ['-headless']]);
        $capabilities->setPlatform(WebDriverPlatform::LINUX);

        self::$driver = RemoteWebDriver::create($host, $capabilities);
    }
    
    public static function tearDownAfterClass(): void
    {
        self::$driver->quit();
    }

    public function testTaskIsSuccessfullyCreatedFromContent()
    {
        self::$driver->findElement(WebDriverBy::id('title')) 
             ->sendKeys('Acceptance test title'); 

...

        $this->assertEquals('Acceptance test title', self::$driver->findElement(WebDriverBy::id('title'))->getAttribute('value'));
    }
}

This gives me the full power of phpunit, with fixtures, asserts, reports and a nice console output:

PHPUnit 9.2.0 by Sebastian Bergmann and contributors.

..                                             2 / 2 (100%)

Time: 00:14.702, Memory: 6.00 MB

OK (2 tests, 3 assertions)