Kategorien
Coding

Comparison of Python mutation testing modules

When I discovered the existence of Mutation Testing, it was a revelation. It showed me an almost effortless way to “test my tests” with an absolutely clever way. Recently I did a lot of coding in Python and when I looked for a mutation testing framework for that language I found many results, but no comparison. So I did my own 🙂

The following table is a very subjective and superficial evaluation, without investing extensive digging into each tool’s configuration options. I just took the most straightfoward way to run it on a current company project.

ModuleActively maintained?Ease of useRaw outputEvaluation
mutmutyesok

run & generate html
“Killed 640 out of 954 mutants”

HTML file
314 non killed mutants of which 280 were in a config.py file, so can be ignored.
MutPynoneeded local fix

run
all: 146
killed: 0 (0.0%)
survived: 144 (98.6%)
incompetent: 2 (1.4%)
timeout: 0 (0.0%)
The run is very fast, so I have the feeling that the unit tests for some reasons are not really executed (even though they are listed on startup)
mutatestnosimpleSURVIVED: 5
DETECTED: 16
TOTAL RUNS: 21
RUN DATETIME: 2021-10-10 13:46:20.181892
It seems to run only a random set of mutations
Cosmic Rayyescomplex

create config file, init, baseline, execute, generate html
HTML file
199 non-killed mutants

No summary/expand for each file, only long list of findings

From the above, the maintained mutmut and cosmic ray found the most surviving mutants in my example project, while both other tools found much less. MutPy left a dubious feeling if it actually worked as intended. Both MutPy and mutatest are not maintained anymore. In the future I will look closer into mutmut and cosmic ray and see if I can tweak them towards actually using the results, as the initial ones are still a bit “too raw” to directly act upon (which lies in the nature of my test project, too).

Kategorien
Coding

VS Code Extension: Save & Run

Visual Studio Code is my current IDE and I really enjoy it. Its quite simple and elegant in its vanilla version and the extension ecosystem is really great. In this short blogpost I want to promote a little extension which really helps my daily workflow a lot: Save & Run

What it does? Whenever a (code) file is saved, it launches a defined command.

What can it be used for? In my use case, I am using it to execute the unit tests for my project(s) whenever I save a code file. Of course I could execute unit tests and Static Code Analysis also via the GUI in VS Code or by a dedicated keyboard shortcut. But I am saving files quite often via Ctrl+S, so why not combine it together.

Lets walk through an example. After installing the extension, this is how the settings.json (either for the user or the workspace) needs to be adapted:

{
    ...
    "saveAndRun": {
        "commands": [
          {
            "match": "\\.py$",
            "cmd": ".\\local_testing.ps1",
            "useShortcut": false,
            "silent": false
          }
        ]
    },
    ...
}

As you can see, you can define multiple commands, each triggered when the “match” clause applies. In this case, I am working on a Python project, so I want to execute the command whenever I change (and save) a .py file. The “cmd” is then the shell command to execute, here its a Powershell script executing local tests:

python.exe -m pytest --cov . --cov-report=html --cov-report xml:cov.xml --quiet --cov-fail-under=75
python.exe -m flake8 --ignore=E501 .

Thats it. Whenever I hit Ctrl+S in the Terminal view the following output is given:

As you can see, the tests are really fast, so its not executing forever. Of course thats because the project and its test cases are simple. For bigger projects this may get more beefy. However, in my current project it helps me a lot to not accidentally miss when I break the code.

Kategorien
Book Culture

Thoughts on “How Google Tests Software”

During my recent years as Software Project Manager I learned to appreciate modern software testing beyond just a tool. In earlier blog posts I described my personal hands-on progress on the matter. Thanks to some teachers and mentors I gained knowledge in theory and practice and nowadays call myself a software testing enthusiast (which doesnt mean I am particularly good at it – I just enjoy the topic).

The Book “How Google Tests Software” from James Whittaker, Jason Arbon and Jeff Carollo is already some years old – a fact I didnt know about initially, but some aged concepts made me look into the publishing date. Also some references are not working anymore, including a link to Google’s own “Google Test Automation Conference”. However, its still a great summary on aspects of modern software testing, with a focus on organizational aspects. There is no mention of mutation testing, probabily-based testing and other testing approaches I would call “advanced”, instead there is a lot of focus on the specific roles in and around testing at Google.

In the forewords the authors’ boss Patrick Copeland introduces the term of “Engineering Productivity” and why he has chosen this as the department’s name. I find this term very appealing, especially in its connection to testing. In my domain, automotive, testing is often seen as something which slows projects down and its done in the very end after features have been completely implemented to “drive maturity”. For me, proper testing is a method to be faster after all, and doing it late is possible but is for from being efficient. The Engineering Productivity organization at Google gathers the lead on test activities, respective tooling around testing and the mentoring/training efforts to spread a testing mindset throughout the whole organization.

In the book, the saying “scarcity brings clarity” is coming up multiple times, and the authors say

The first piece of advice I give people when they ask for the keys to our success: Don’t hire too many testers.

How Google Tests Software, page 4

Googlers might be smart, but they are not plentiful. Every TEM we’ve ever hired from outside of Google makes the comment that their project is understaffed. Our response is a collective smile. We know and we’re not going to fix it. It’s through knowing your people and their skills well that a TEM can take a small team and make them perform like a larger team.

How Google Tests Software, page 188

As a young leader I dread the constant understaffing all around me. Such looks into highly regarded companies like Google make me believe its something inevitably in any fast-paced company. Being always short on good people compared to the tasks at hand must yield the most efficient use of people as possible. But how to balance this in a way which satisfies me is something I admit to not have found the key for yet.

In major parts of the book some key roles are described in detail with their relevance to testing: The (“normal”) Software Engineer (SWE), the Software Engineer in Test (SET) and the Test Engineer (TE). On page 7 the roles are summarized in a nutshell, while many many pages repeat and detail this again and again – a style I find very typical for American textbooks. In essence its clear and I assume my mentors must have either read the book years ago or used common root sources. The worksplit seems obvious to me now. SWEs are developing the code, but have to contribute automated testing on all levels to secure the functionality and quality of the system, even when no other test-focused roles are around. SETs are pushing the boundaries of test tooling, enabling new kinds or aspects of test automation wherever they can. TEs are driving testing especially in a “challenger” approach, to catch all the issues SWEs don’t see or dont want to see initially.

I already mentioned that the book doesnt go very deep into test technology. What I found very interesting is that Google separates their test activities simply by “small tests”, “medium tests” and “large tests”. According to the authors they chose to do this because other terms are overloaded and vague throughout the software industry do a degree that people are talking about different things and often didnt recognize.

On pages 40f some nowadays state-of-the-art concepts about how to include test automation into pipelines are brough up. E.g. having test code near to the functional code. Also the classification of small/medium/large tests is added with some concrete data: large tests have to run within 15 mins, small tests are required to execute in less than 100ms. This shows the enourmous compute power Google is able to leverage for their software development. Also there is a good note to quote:

Small tests lead to code quality. Medium and large tests lead to product quality.

How Google Tests Software, page 47

Only both in combination provide long-term value to a company. Having insufficient small tests strongly correlates to bad code quality which will slow down further development. Having insufficient medium and large tests (system tests) lead to many gaps in the functional and non-functional end-to-end testing, thus yielding bad product quality if there is not immense manual testing in place.

On page 48 two very basics rules for test cases are defined, which I find highly relatable: Tests have to be independent of each other so they can be executed in any order (and in parallel). Also tests must not have any persistent side effects.

Interestingly the book makes some references to a build and test execution system with sophisticated dependency graph analysis, without giving the tool’s name. Nowadays I know that they are referring to Bazel. The book must come from a time when it was not yet published.

On pages 54ff the authors describe a “test certified” program which they used to promote the testing inside Google, including some test certified levels for teams to achieve. In a very telling interview I learned a lot that even at Google the testing proponents had to overcome a lot of internal resistance.

The dream of a unified dashboard has haunted Googlers, much like it has at many other companies. Every year or so, a new effort tried to take root to build a centralized bug or project dashboard for all projects at Google.

How Google Tests Software, page 124

Its late and I want to finish this blog post, so I am closing with an inspiring quote from Joel Hynoski, answering why someone should pursue a career in test:

Test is the last frontier of engineering. We’ve solved a lot of the problems of how to develop software effectively, but we’ve still have a green field of opportunity to attack the really meaty problems of testing a product, from how to organize all the technical work that must get done to how we automate effectively, and responsively, and with agility without being too reactive. It’s the most interesting area of software engineering today, and the career opportunities are amazing. You’re not just banging on a piece of software any more, you’re testing the GPU acceleration of your HTML5 site, you’re making sure that you’re optimizing your CPU’s cores to get the best performance, and you’re ensuring that your sandbox is secure.

How Google Tests Software, page 206
Kategorien
Coding Culture

Technology Radar #24: Automotive SW perspective

I am following the Thoughtworks Technology Radar since approximately five years. Since I first became aware about it (IIRC a colleague forwarded a version to me), I enjoyed its volumes as a condensed summary of current trends in the software industry. There is some bias towards cloud technology and machine learning which is out of my current professional interest, however there are enough other concepts/tools/aspects every time which make me investigate and followup to my best possibilities either at work or in my private projects. In this blog post I will try to list those parts which are roughly new and relevent from an automotive software perspective.

As usual, there are some selected topics on the top level they are presenting. Already the first one – Platform Teams Drive Speed To Market – is highly relevant. Its no secret that the automotive industry is widely moving towards in-house platforms, recent example from the news are vw.os (Volkswagen) and MBOS (Mercedes-Benz). The particular recommendation is to handle such platform activities explicitly as products, with all the implications. Only a platform with its own set of features, product vision, dedicated team and lifecycle can actually serve its purpose. A platform handled as a side-gig of an enduser product will neither serve the platform users well nor will it provide sustainable value, plus there would be a high chance of not surviving the first generation.

The second highlighted aspect – Consolidated Convenience over Best in Class – is also very relevant in the automotive SW sector. All of us love the regular discussion which code review tool is the best, and which CI tool is the newest hot shot we must adopt. Integrated workflow tools which integrate requirements/tickets to code, integration, tools and deployment such as gitlab and github (with github actions) promise so much more convenience in inter-tool integration and generally less setup hassle compared to having multiple tools each with its own authentication foo. However, there is also much higher risk of vendor lock-in. This can result in financial dead-ends and of course sooner or later lead to loss of flexibility on adopting new technology in the workflow area.

Jumping over the third highlighted topic and going straight ahead to the fourth – Discerning the Context for Architectural Coupling – this covers the architectural state of the union address. Baseline of the message here is “it depends”. Discussions with buzzwords like “monoliths” and “microservices” are seldom helpful, the real discussion needs to happen on the detailled requirements, constraints and principles.

Going deeper into the four sectors of the radar – Techniques, Tools, Platforms, Languages&Frameworks – lets start with Techniques. The new entry API expand and contract is directly an “adopt” recommendation. It means rather than breaking changes with new API versions too often its more about adding new interfaces while keeping but deprecating old ones, giving consumers a chance to adopt without breaking their usage. I can see immediate benefit of such patterns especially on highly used interfaces such as the vehicle communication interfaces. If “deprecation would be a thing (at scale)” it could relief many breaking integration challenges in the automotive domain.

In the trial circle we find Hypothesis-driven legacy renovation:

Rather than working toward a standard backlog, the team takes ownership of a measurable technical outcome and collectively establishes a set of hypotheses about the problem. They then conduct iterative, time-boxed experiments to verify or disprove each hypothesis in order of priority. The resulting workflow is optimized for reducing uncertainty rather than following a plan toward a predictable outcome.

https://www.thoughtworks.com/radar/techniques/hypothesis-driven-legacy-renovation

Sounds quite interesting. I am, too, used to handling technical stories like user stories. The given approach may indeed yield more efficient and effective collaboration on technical improvements.

Another very relatable technique is Lightweight approach to RFCs. Its something which my personal workstyle already incorporated since long time. I strive to create very fast first drafts for any task and then seeking review comments from my peers. Hence, I can absolutely understand the point of this topic and clearly recommend this. Any organization in need of collaboration and frequent alignment on concepts can benefit from a very lean set of templates encouraging early feedback rather than a culture which expects “perfect first shots”.

Team cognitive load and its linked Inverse Conway Maneuver give also very interesting food for thought. I am well aware about some teams in my environment who struggle a lot with the complexity at hand, and the organzational environment adds to that complexity significantly. The Inverse Conway Maneuver – organizing the team organization around how the proper architecture of a system looks like rather than structuring the architecture around the semi-random existing organizational structure – makes a lot of sense. It chimes well with the notion of Feature Teams and team ownership.

Remote mob programming as an extension of local mob programming, which is by itself the advancement of pair programming is certainly a recommended practice in many situations. Recently I have tried to bring teams together in “remote debug sessions” for hard inter-team nuts (=bugs) with not so great success. There is a lot of hesitance of teams to just spend some focus time together, and time difference in international setups is only one external factor. While (local) pair programming is something I think every team manager should at least try with his team a few times to see if it works out, mob programming depends an order of magnitude more mature team dynamics.

Under “hold” we can find Peer review equals pull request. The baseline here is that pull requests are only one way to conduct peer review, and as such its a very valid claim and call for further action. Looking only at fragmented changes (which 99% of PRs usually are) and reviewing that alone doesnt give context how a codebase looks overall. Full code reviews or audits certainly give great insights for potential refactoring efforts and even re-designs, besides other learnings.

The scaled agile framework SAFe is a hold and a look in the history shows it was since the beginning. Thats interesting. I am no expert in SAFe, and only know it in theory.

Separate code and pipeline ownership is really triggering me while I write this. Thats because I recently gave a presentation about the exact same thing and why both shouldnt be separated. I am wondering if that thought appeared to me myself or if I read about it the radar earlier. Nevertheless, I think anything else than keeping both together cannot really be justified from a professional sw development context (neglecting some extreme theoretical circumstances). Having that said, its not yet an established truths, and organizational structures and other opinions enforce a separation more often than I would wish for.

The tools sector doesnt offer really relevant insights this time, its all about backend intra, web dev, ML and that kindof stuff.

In the platforms sector the first thing which appears intersting is Backstage. Its a platform developed by Spotify which in essence is a simple overview on projects going on within an organization (company). It can be used to advertise cool efforts in the developer community. Something which could be worth a try. However, as a separate tool it may lack synchronizity with the actual developer activities. Integrating such a frontend into gitlab could yield some interesting dynamics, because that could leverage existing information about activitiy of projects and use data directly from the individiual projects/repos in there.

In the languages & frameworks sector again nothing relatable.

In total the new tech radar volume give me some very valuable insights and food for more thought.

Kategorien
Coding

Dependency updates with Renovate

Recently I studied the Thoughtworks Technology Radar in its newest version and came upon the recommended activity of “Dependency Drift Fitness Function“, referring to the output of dependency analysis and update tools like Dependabot and Snyk. I knew Dependabot already since a while as handy tool to analyze a repo’s dependencies and create automatic pull request. The article motivated me to look into the topic for my own project. As usual I didnt want to use a SaaS solution for my personal pet projects, but use my own infrastructure. Thats where I found renovate, a handy tool for the same purpose with quite straightforward setup.

Nevertheless it took me a bit trial-and-erroring to make it work for my setup with self-hosted gitea. The renovate documentation – at least the parts I studied – didnt really describe clearly how to connect to a self-hosted tool until I found the –endpoint argument in the CLI help. In total its usage boiled down to two lines to get it running:

npm install renovate
GITHUB_COM_TOKEN=ghp_FOOBAR node .\node_modules\renovate\dist\renovate.js --platform gitea --token abcdef1234567890 --endpoint https://example.com/gitea/api/v1 --labels renovate project/git_repo

That resulted in a first pull request to setup a configuration file. I really liked the well written comment on this PR from the bot, giving details and even an outlook on next steps.

After merging the inital PR, the first “real” one was created upon executing the same script as above. As announced the PR was about lifting the version of the phpdotenv package from an old version 2.4+ to 5+. Really cool, I didnt earlier check this, it must be really old.

There was a little snag as this PR couldnt be merged by itself, as the version bump resulted in some necessary code adaptions. But that was due to syntax changes from the package itself, so nothing to blame on renovate.

As a little further step I added a renovate_runner repo with only the following Jenkinsfile to execute the script every midnight. I am looking forward to future PRs from this bot.

pipeline {
     agent any
     triggers {
         cron('@midnight')
     }
     stages {
         stage("Run renovate") {
             steps {
                 sh "npm install renovate"
                 sh "GITHUB_COM_TOKEN=ghp_FOOBAR node .\node_modules\renovate\dist\renovate.js --platform gitea --token abcdef1234567890 --endpoint https://example.com/gitea/api/v1 --labels renovate project/git_repo"
             }
         }        
     }
     post {
         always {
             echo "One way or another, I have finished"
             deleteDir()
         }
     }
 }
Kategorien
Book Coding Culture

Thoughts on Refactoring from Martin Fowler

In my series of book reviews on classics its time to take on another evergreen: Refactoring from Martin Fowler. It was already referenced a multiple times in the earlier books and now it was time.

The concept of refacoring is well-established nowadays, and I would say that only a company culture which incentivizes and motivates a spirit of constant refactoring can be a state-of-the-art software company. In my experience, this is easier said than done, especially when immense release pressure and tight resources lead to the well-known vicious circle of crunching and task forces, which never end. When one release was successfully squeezed out “somehow” and the next one is already knocking on the doors, how could anyone expect refactoring of code which actually “works” (it made it into the last release after all!!!). I had such discussions many times, and until today I dont feel strong enough in my arguments to convince senior managers in typical situations. Arguing with sustainable code and continuous improvement when the other side virtually puts the existence of the whole organization at risk is an uphill battle. But its worth fighting.

In essence when you refactor you are improving the design of the code after it has been written. […] With refactoring you can take a bad design, chaos even, and rework it into well-designed code. Each step is simple, even simplistic. You move a field from one class to another, pull some code out of a method to make into its own method, and push some code up or down a hierarchy. Yet the cumulative effect of these small changes can radically improve the design. It is the exact reverse of the normal notion of software decay.

Martin Fowler: Refactoring, page 9

I like this simple but strong definition, which underlines the cumulative effect of small improvements. Refactoring is not the same as the “grand redesign” or “from scratch” approaches which are often taken and loved, especially when the staff (developers, managers) is changed. Refactoring can achieve the same goals with either the same or a different staff.

After some introductory words Fowler goes ahead with a concrete example. He emphasizes the repetitive nature of changing something a little bit, then running an extensive unit test suite and then committing the changes to a repository. This approach is something I immediately started to exercise. For that I had to setup some of my testing scripts to work locally (they were purely in my CI before), but the effort was totally worth it. With my unit test coverage being 98% nowadays plus mutation testing in place, I dont have to worry a lot to break existing functionality while refactoring, and as a lost resort I have all the extensive API and acceptance testing in my CI before I can merge on master.

The true test for good code is how easy it can be changed.

Martin Fowler: Refactoring, page 77 (translated back to English)

This quote may be controversial among people with low exposure to professional software development (juniors, managers who gave up on programming a while ago), but for professionals in my environment its fortunately common sense. Which doesnt mean that its applied constistently, though.

If someone tells you that their code during refactoring didnt work for a few days you can be quite sure that they didnt apply refactoring as such.

Martin Fowler: Refactoring, page 79 (translated back to English)

Such reworks or redesign of larger parts of a codebase of course can and may happen, too, but I appreciate Fowler’s take on a clear separation. Especially in arguments with upper management, clear terminology can help. If refactoring can mean everything happening to a codebase except adding new functionality, its probably to vague. Restricting its meaning to pure tiny and small changes reduces risks a lot and may help to increase acceptance.

Refactoring Helps You Program Faster

Martin Fowler, Refactoring, page 82

This was and is also my personal strong belief, and I think there is enough evidence that this is a fact. I mentioned earlier sustainability, and if a well-design and maintained code base based on clean code principles and with constant refactoring exists, its the best way to get a graph like the red one below.

Source: https://frontendwebblog.wordpress.com/2020/07/07/notes-from-refactoring/ (who presumably has taken it from Martin Fowler: Refactoring, English version)

Note also the short part in the lower left corner, where poor design for a while may have more functionality than good design. Naturally, with hacky proof of concepts hitting production, you can push out some functionality “fast”, but at the cost of low efficiency later on. On pages 90f (German version) declares this the major argument in favor of refactoring – in the end its about the business value it provides, and that one can be immense.

On pages 86f Fowler discusses if one should reserve dedicated time for refactoring. I heard before of models like “every third sprint is a refactoring sprint” and of course situations where “management decided the next sprint is for refactoring to fully focus on features after that again”. Fowler argues that reserved refactoring slots shouldnt be a thing. I tend to agree with him, by stating that refactoring should be considered in day-to-day work efforts and not be a sepereate activity in contrast to “normal development work” (which it should be a part of). However, as the value of refactorings is hard to measure and aforementioned “implicit” integration into the developer workflow easily gets it ignored or watered down, some accompanying activities and events encouraging refactoring may provide benefit.

What Do I Tell My Manager? […]

Of course, many people say they are driven by quality but are more driven by schedule. In these cases I give my more controversial advice: Don’t tell!

Martin Fowler: Refactoring, page 89

Similar to Bob Martin, Fowler argues with refactoring being an essential part of professional software development, thus its not something a manager should even have the change to interfere with in particular. I agree – mature develops should just do it and get the time needed. Asking management for approval for such a core activitiy in the actual coding, that it would be weird to explicitly ask for it. Do you ask for permission to apply certain design patterns, too? Of course, when asked, management will respond and the answer in probably 80% not what you would do. Management will probably realize at some point of refactoring gets too extensive (if that is even possible), but in the end refactoring is for everyone’s best (see the graph above).

On pages 93f Fowler has some interesting input on the interaction of (feature) branches and refactorings which I didnt see before. In essence its simple: Both are working in opposite directions. If there are multiple branches living in parallel and in one of them refactoring is happening, it may make the other branches unmergable (and vice-versa). So if a company or project wants to encourage refactoring, it should avoid active branches wherever possible.

There are more arguments for refactoring and a myriad of hints and tricks how to apply it, when to apply it and how to carry it out in an optimal way. The second part of the book contains detailled descriptions of all types of refactorings. I have skipped through them, but I know I wouldnt remember them exactly anyways. However, with the gist of things in mind I did a major refactoring of my current pet project, and it was really fun and productive. Now my codebase is even cleaner (I did already a few rounds after reading Clean Code) and I am looking forward repeat it while I continue adding functionality.

It was a very good book and a certain recommend, if you didnt read it already.

Kategorien
Book Coding Culture

Thoughts on Clean Agile from Robert C. Martin

My third book review leads me to the third book in a row from Robert C. Martin. You may think I am a fanboy or such, but its just that as written before all those books and more I got provided from my employer, and somehow I liked the flow of Martin’s books, so I read them one after the other. But fear not, the next book will be from another author.

So “Clean Agile”: Already the title is a bit triggering. I remember a online talk from another big agile representative (forget the name though) who was very clearly discouraging to use “Agile” as a noun for anything, but always use it in its grammatically correct way as an adjective. Thats what I thought about when holding the book in my hand. However, I dont really care for the difference. Grammar is the one side, the other is that agile/agility has been a standing term in the sw industry for a while now, so one may as well use it in its noun form. Nevermind.

The first thing noteworthy is the history lesson about the waterfall process. According to Martin, the 1970 paper from Winston Royce which according to popular belief introduced waterfall and initiated its establishment, was actually arguing against this model. If that is true, a lot of people must have read only the first part of that paper and then went on implementing that part – because it chimed well with the notions of Scientific Management which were cool back then.

Martin is also very explicit about the small vs big team problem. In his opinion the organization of big teams is a solved problem since ages. What was open was the organization of small teams developing software. To scale from there seems to be the easy part for him. In theory, he may be right, but looking at my project which we scaled from 1 to 65 teams in years working on one product, the organism and interactions between teams and the overall organization cannot be reduced that easily. That would assume an organizational maturity on project level and on each single team that is hard to reach under realistic circumstances. I am not saying its not feasible, just that its not that simple. But Martin has shown to simplify a lot to make his points.

Another new information for me was the term “Iron Cross” of project management:

The reason that these techniques fail so spectacularly is that the managers who use them do not understand the fundamental physics of software projects. This physics constrains all projects to obey an unassailable trade-off called the Iron Cross of project management. Good, fast, cheap, done: Pick any three you like. You can’t have the fourth.

Robert C. Martin, Clean Agile, page 15

I agree with this assessment (of course in a theoretic setup you may be able to achieve all four, but in this world this is unlikely). On the following pages, Martin describes quite typical project development in the waterfall world up until the deatch march phase.

Another example of Martin’s provocations is his statements that “The loss of hope is a major goal of Agile.” Sounds crazy, right? What he aims at by using the metrics (like velocity) in a proper way, arbitrary and unrealistic deadlines, which are based on hopes, can be overcome, and gradually more realistic estimates can come out. This is also were he makes this statement:

Some folks think that Agile is about going fast. It’s not. It’s never been about going fast. Agile is about knowing, as early as possible, just how screwed we are.

Robert C. Martin, Clean Agile, page 27

Some pages later, Martin once again talks about the “grand redesign” – the typical “cure” when a software organisation has maneuvered itself into a dead end. He repeats his example from his earlier books, where a Tiger Team to redesign the software from scratch is caugth between having to catch up with the legacy software’s development while generating no income.

On page 50 Martin makes a memorable insight:

Humans make things better with time. Painters improve their paintings, songwriters improve their songs, and homeowners improve their homes. The same should be true for software. The older the software system is, the better it should be.

Robert C. Martin, Clean Agile, page 50

I really liked this statement. Software is in many cases connotated with the stance that old software/code bases are mostly getting worse, seldom better. Of course we all know good cases, mostly Open Source projects going on since decades (Linux Kernel, gcc, Latex, …), but in the proprietary software industry its often different. I cannot count how often old software really got worse over time – no matter if it was disimproved by rushed hacks or not changed at all. Martin is stating the obvious, but in that clarity I never saw that.

Many people have claimed that up-front planning is not part of Agile development. […] Of course the business needs a plan. Of course that plan must include schedule and cost. And, of course that plan should be as accurate and precise as practical. […]

In short, we cannot agree to deliver fixed scopes on hard dates. Either the scopes or the dates must be soft.

Robert C. Martin, Clean Agile, page 57f

I like this statement, because in few sentences Martin clarifies a typical debate around agile approaches. Some nitpickers – often with the aim to show the unsuitability of agile practices for tough industry business – claim that agile is anti plans or discourages plans.

Rising Velocity

If we see a positive slope, it likely does not mean that the team is actually going faster. Rather, it probably means that the project manager is putting pressure on the team to go faster. As that pressure builds, the team will unconsciously shift the value of their estimates to make it appear that they are going faster.

This is simple inflation. The points are a currency, and the team is devaluing them under external pressure. Come back to thatteam next year, and they’ll be getting millions of points done per iteration. The lesson here is that velocity is a measurement not an objective. It’s control theory 101: don’t put pressure on the thing you are measuring. […]

One way to avoid inflation is to constantly compare story estimates back to the original Golden Story, the standard against which other stories will be measured. Remember that Login was our original Golden Story, and it was estimated as 3. If a new story such as Fix Spelling Error in Menu Item has an estimate of 10, you know that some inflationary force is at work.

Robert C. Martin, Clean Agile, page 81

This is a very important observation and I can confirm it from practice. As a project manager I am monitoring team’s velocity charts. However, I am not doing this to enforce rising velocity charts, but to identify trends in underachievements (commitment vs achievements). Speaking about the average mature team, I think the best to aim for is a stable sprint velocity over longer duration of times, leading to a sustainable pace (Martin writes more on that on page 103). The last paragraph above gives a nice approach how to determine velocity inflation.

Working overtime is not a way to show your dedication to your employer. What it shows is that you are a bad planner, that you agree to deadlines to which you shouldn’t agree, that you make promises you shouldn’t make, that you are a manipulable laborer and not a professional.

This is not to say that all overtime is bad, nor that you should never work overtime. There are extenuating circumstances for which the only option is to work overtime. But they should be extremely rare. And you must be very aware that the cost of that overtime will likely be greater than the time you save on the schedule.

Robert C. Martin, Clean Agile, page 103

I really like this quote, it fits to my “natural instincts”. My work and contributions are most sustainable when I can work more or less normal hours. That means I can rest, stay sane and – more relevant to my employer – reflect on what is going on at work. As a team and project manager my day at work is usually crammed with meetings, mails and nagging people. Very rarely I have time to think deeply about what goes on. Hence, the best ideas I have after work, during weekends and vacations. I usually make sure to take notes about those ideas, so I can followup during working times.

On pages 106f Martin shares another interesting anecdote about the “developer hierarchy” at a printer company. The baseline is, that the printer software developers enjoyed a lot of praise for their direct contributions to the company’s main business. However, that led to elitism and closing up of their code. Other software developer working in indirect matters did receive lower trust and couldnt contribute that much. I think this is something to observe also around my work. The ones directly developing fancy features which make it on magazines and news pages feel much more appreciated than the ones working in the machine room, keeping the whole machinery running. I think the boundaries between those groups must be as permissive as possible.

The point is that Continuous Integration only works if you integrate continuously.

Robert C. Martin, Clean Agile, page 108

Simple but powerful words. I cannot believe how often CI is preached and promised, while it boils down to integration of a full team’s work once a sprint or even less.

The continuous build should never break. A broken build is a Stop the Presses event. I want sirens going off. I want a big red light spinning in the CEO’s office. A broken build is a Big Effing Deal. I want all the programmers to stop what they are doing and rally around the build to get it passing again. The mantra of the team must be The Build Never Breaks.

The Cost of Cheating

There have been teams who, under the pressure of a deadline, have allowed the continuous build to remain in a failed state. This is a suicidal move. What happens is that everyone gets tired of the continuous barrage of failure emails from the continuous build server, so they remove the failing tests with the promise that they’ll go back and fix them “later.”

Robert C. Martin, Clean Agile, page 109

Not much more to say, except that this is hard in traditional environments to achieve, and very easy to loose once gained. Just recently I got the chance to focus more, which I really appreciate.

The belief that quality and discipline increase speed is a courageous belief because it will constantly be challenged by powerful but naive folks who are in a hurry.

Robert C. Martin, Clean Agile, page 134

I love that quote because it is just true. And I admit that sometimes I am one of those naive folks. However I work to be it less every day.

Towards the end of the book, Martin lays out the new movement of Software Craftmanship. I frankly didnt hear about it before, but I like the ideas about it.

As aspiring Software Craftsmen we are raising the bar of professional software development by practicing it and helping others learn the craft. Through this work we have come to value:

Not only working software, but also well-crafted software

Not only responding to change, but also steadily adding value

Not only individuals and interactions, but also a community of professionals

Not only customer collaboration, but also productive partnerships

That is, in pursuit of the items on the left we have found the items on the right to be indispensable.

https://manifesto.softwarecraftsmanship.org/

Many will recall the format following the Agile Manifesto. However I see it a very valuable extension from the perspective of developers.

Let me end here. Probably long enough text for anyone ever to read here 🙂

Kategorien
Coding

Mutation testing with infection

A few months ago I read about mutation testing for the first time. At that time I started my current pet project with a focus on proper testing, and the idea hooked me immediately. However, back then my overall testing state was not at a level that such advanced mechanisms would make sense. But now, during my christmas vacation, the time was right 🙂

What is mutation testing? In my own words: with mutation testing one can check the quality of unit tests for code, by automatically and intentionally mutating the code and comparing the unit test results with the original ones. The mutated code is called a “mutant” and if your test cases recognize most of them your tests may be considered sufficient (a lot ifs and whens apply here).

For my php-based pet project, infection was the natural choice; I couldnt find any other mutation testing framework which is similarly well maintained and documented.

The initial setup, however, was not fully smooth, I had to overcome some issues in my environment:

First, I had some trouble to run my unit tests locally (so far I only executed them on my server and in CI, but I realized I can shorten my turnaround times significantly when I also enable them locally). As I – shame on me – only have Windows on my desktop, I had to setup php (with extensions such as xdebug), mariadb and Papercut (SMTP service) locally. I wanted to do this already for a while, so now I digged into it. Cost me some hours to sort everything out.

The second issue was more tricky. Some history: Due to the way how PHP and PHPUnit work (some say its even a bug in PHP?), outputting http headers via the header() function is conflicting with PHPUnit. A helpful answer on stackoverflow lists the options to solve this, and I chose the second option long time ago. The first option is not working for me, the third option didnt seem feasible either as I actually need the header output for my functionality. So I chose to run PHPUnit with the “–stderr” parameter, worked for me. With infection however, this resulted in issues. Outputting all PHPUnit results on stderr made infection, which parses PHPUnit’s output for errors, fail. I almost got mad in resolving this “deadlock”, but then stumbled over a github issue in which some compatibility topics were discussed. As far as I understand, their issue was another one, but I tried the given workaround, which seperates the initial PHPUnit run from the mutated runs:

vendor/bin/phpunit --coverage-xml=build/logs/coverage-xml --log-junit=build/logs/junit.xml
vendor/bin/infection --skip-initial-tests --threads=$(nproc) --coverage=build/logs --show-mutations --no-interaction 

This gave me the first output of infection:

The metrics at the bottom looked quite good, according to the infection documentation those values are good-ish. The hint in the second-to-last line is also important to acknowledge: Not all slipped-through mutants are in fact problems. Anyhow, it also shows 26 not covered mutants, i.e. code mutations which were not uncovered by my existing test cases.

In the infection.log file you can see which of those there are. Example (redacted):

2) ...\TaskController.php:137    [M] NewObject

--- Original
+++ New
@@ @@
         if (empty($taskId)) {
-            return new Response('HTTP/1.1 500 Internal Server Error ');
+            new Response('HTTP/1.1 500 Internal Server Error ');
+            return null;
         }     

Seems my test cases are not checking the return-types sufficiently or my code is not further using the returned objects here. Interesting! As a next step I will investigate how I can improve my unit tests based on the mutation test results.

Kategorien
Book

Thougts on Clean Architecture from Robert C. Martin

After recently reading Clean Code from the same auther, now I followup on Clean Architecture from Robert C. Martin. While Clean Code focusses mostly on the smallest entities of software development and in a simplified manner can be called bottom-up, Clean Architecture naturally comes from top-down. Clean Code is much easier to apply in examples such as my current pet project, Clean Architecture is something more abstract for me, as I do not really have a huge software project at hand to apply.

Down the darkest path comes the idea that strong and stable architecture comes from authority and rigidity. The architect’s mandate is total and totalitarian, the architecture becoming a dystopia for its developers and a constant source of frustration for all.

Down another path comes a strong smell of speculative generality. A route filled with hardcoded guesswork, countless parameters, tombs of dead code, and more accidental complexity than you can shake a maintenance budget at.

The path we are most interested is the cleanest one. It plays more to our strengths than our weaknesses. We create things and we discover things. We ask questions and we run experiments. A good architecture comes from understanding it more as a journey than a destination, more an ongoing process of enquiry than a frozen artefact.

Kevlin Henney in the Foreword

Kevlin’s words chime very well with my personal understanding of good software architect mindset. In my career, I have seen all of the above, and I am glad in my current project all architects have the latter. However, its not a closed book. Many managers, suppliers and developers still demand for what I would call static enterprise architecture, with different rationale behind. While management sees architecture as a definition of tools, libraries and complete set of boxes to ultimately save money by “clarity”, suppliers want something fixed to make an offer towards and have something to blame when problems arise.

I’ve seen it happen. I’veworked in projects where the design and architecture of the system made it easy to writeand easy to maintain. I’ve experienced projects that required a fraction of theanticipated human resources. I’ve worked on systems that had extremely low defectrates. I’ve seen the extraordinary effect that good software architecture can have on asystem, a project, and a team. I’ve been to the promised land.

Introduction

This is an important statement, easy to overlook and ignore. Most of us know reality well enough to have skepticism regarding the applicability of clean approaches to corporate software development. The constant stress by deadlines, tight budget, competing stakeholder’s requirements and technology fights often overrules the best intentions. But giving up is no option. Every new project, every new week its worth to keep trying to establish a bit better code, a bit better architecture. Our own future selfs, and our professional spirits will be thankful.

The goal of software architecture is to minimize the human resources required to build and maintain the required system.

What is design and architecture?

I am not sure if I would call the above “The” goal, but certainly its “a” central goal. Replacing “human resourcues” by “effort” for me would make this goal statement more generally applicable. Effort can include human resources, but also license cost, mental imbalance and turnover.

On pages 5ff, Martin gives a case study of a company sinking exponentially more engineering staff while creating exponentially less output. Interesting and highly relatable. I hope I find ways to calculate according metrics also for my private and professional projects.

And that’s the answer to the executive’s dilemma. The only way to reverse the decline in productivity and the increase in cost is to get the developers to stop thinking like the overconfident Hare and start taking responsibility for the mess that they’ve made.

The developers may think that the answer is to start over from scratch and redesign the whole system—but that’s just the Hare talking again. The same overconfidence that led to the mess is now telling them that they can build it better if only they can start the race over. The reality is less rosy:

Their overconfidence will drive the redesign into the same mess as the original project.

What is design and architecture?

Martin is focussing here on the software developers and provoking them to not take management influence as an excuse for their own failure to maintain sane software projects. He has done the same in Clean Code. Personally, I can see both sides. Of course management often overrules developers to gain what they call “speed” by effectively telling them to hack some dirty solution together. On the other hand I have seen enough developers either having a messy approach or giving up on management on the slightest push. So being somewhat in between I always try to balance and find tradeoffs. It sounds tempting to target only clean approaches and refuse deadlines completely, but its unrealistic and being too hard here my backfire. I know a developer who is so strict in his believes (and judging only by the technical content he is always right), that he is notorious to be ignored. Who wants to work with somebody who is not willing to compromise at all? (Me, yes, but many other managers take the short path and just go to the next developer)

Thus the final version of the SRP is:

A module should be responsible to one, and only one, actor.

SRP: The Single Responsibility Principle

Finally I got it. For a long time I really struggled to internalize the Single Responsibility Principle. All the other definitions or descriptions kindof made sense, but failed me to be applicable in a crisp way. With the above definition using the actor I got it 🙂 Apart from the SRP, Martin of course also introduces also the other SOLID principles.

While the SOLID principles apply to class interdependencies, he continues to establish three principles on component interdependencies. Naturally, they relate to the SOLID principles, so I wondered if there are ways to generalize a set of principles to apply them on any level of software, from systems via components to classes (and maybe even functions).

The Reuse/Release Equivalence Principle:

The granule of reuse is the granule of release

Component Cohesion

I really liked this principle, it sounds so obvious and clear, however I never thought about release cadence as architectural principle. Nice!

The Common Closure Principle and the Common Reuse Principle are very similar to the Single Responsibility Principle (see above) and the Interface Segregation Principle from SOLID, so both didn’t sound so new.

In the section on Component Coupling, Martin introduces two interesting metrics: Instability and Abstraction. After some reasoning, he defines the area around low instability and abstraction as the zone of pain and the area around high instability and abstraction as zone of uselessness. Classes on the “main sequence” in the diagonal from (0/1) to (1/0) instead are considered sane. Exceptions of course apply, e.g. for database schemes (stable and very concrete).

I checked around and really could find a tool which calculates and displays the metrics for my pet project code. I directly added it to my CI pipelines. See below the recent output. Seems I need to check some of my code.

The strategy […] is to leave as many options open as possible, for as long as possible.

What is architecture?

Only 135 pages in Martin asks the question “what is architecture?”, and his answers really surprised me, it was nothing I really expected. In the following pages he gives many examples (really many, probably too many) for this. For Martin, “details” such as external interfaces (IO, databases), frameworks, the web are not relevant for the core business rules. The latter are for him the core and to be untainted from all such details.

A good architect maximizes the number of decisions not made.

What is architecture?

I think again this is the provocative author, but in the context of his definition of “decision” and “details” this makes sense.

And then, thankfully, the Agile Software Development revolution arrived and put architects like me out of our misery. I’m a programmer. I like programming. And the best way I’ve found to have a positive impact on code is to write it.

The dinosaurs of Big Architecture—typically to be found wandering the primeval plains of Big Process—were wiped out by the asteroid of Extreme Programming. And it came as a blessed relief.

[…] The problem with leaving architecture to programmers is that programmers have to be able to think like architects. It turns out that not all of the stuff we learned during the Big Architecture era was of no value. The way that software is structured can have a profound impact on our ability to keep adapting and evolving it, even in the short term.

[…] Learn how you can incorporate design principles and Clean Architecture into your development processes […]. Perhaps, before you commit code, ask a colleague to review it with you. And look into the possibility of adding a code “quality gate” to your build pipeline as a last line of defense against unclean architecture. (And if you don’t have a build pipeline, maybe it’s time to create one?)

Most important of all is to talk about Clean Architecture. Talk about it with your team. Talk about it with the wider developer community. Quality is everybody’s business, and it’s important to reach a consensus about the difference between good and bad architecture.

Be mindful that most software developers are not very architecture-aware, just as I wasn’t 25 years ago. More experienced developers clued me into it. Once you’ve wrapped your head around Clean Architecture, take the time to wrap someone else’s head around it. Pay it forward.

[…] The real journey starts here.

Jason Gorman in the Afterword

I think thats perfect last words without any more need of commenting!

Kategorien
Coding

State of Pipelines

Just a brief update on my current pet project’s Jenkins pipelines setup. The Jenkins BlueOcean graph probably already covers it mostly:

I have parallelized some “Fast Tests” – all of these take less than a second to run currently due to the small codebase. If any of them fails it doesnt make sense to proceed. Currently these contain

  • phpmd: “mess detector” using the clean code ruleset to identify unclean code.
  • pdepend: Some code analytics I learned while reading “Clean Architecture” (review will come to the blog soon I hope).
  • phan: Static Code Analysis
  • phpunit: Unit testing

After those I run API tests with schemathesis. This takes one minute (configurable) and sends random payload to my REST API as per the generated swagger file.

Last but not least I run some acceptance tests, as described in Adventures with PHPUnit, geckodriver and selenium.

For a while I also parallelized the latter two test activities in a “Slow Tests” block, but I learned that the test execution was very unreliable due to the load the test environments put on my server. So for now they run sequential.

Today I finished some major clean code refactoring and I am happy to say that I could rework most of the code and the tests in place ensured that functionally nothing broke without me checking the actual application manually once! This is a major achievement for me personally, as all my former pet projects based on heavy manual testing on basically every change.