Kategorien
Culture

Thoughts on „The Phoenix Project“

I had „The Phoenix Project“ on my bookshelf for long time. As it was mentioned in our christmas townhall, this finally triggered me to read it. And man was it a read. In contrast to most other book on IT I have read before, its a novel, and it is written in a very dense way. After few tens of pages, I recommended it to my wife, leading her to devour it even faster than me. I strongly recommend this read to anyone in the IT/SW field, as its showing so many daily messy anti-patterns and ways to finally relieve them in an emotional way. Even for people for whom the teached concepts like DevOps, Lean, etc. are well-known, this book is a strong recommend.

In this article I want to collect stuff which are highlights to me. I do not attempt to provide a comprehensive summary of the plot nor the concept presented. Others have done so much better, just google for a review. But then again, better to read the book yourself and go through the rollercoaster yourself.

Disclaimer: I actually own the 5th anniversary edition of the book, which at the end also contains excerpts from „The DevOps Handbook“. Some citations below are actually from that part.

Change Management

Change Management is one of the re-occuring topics in the book.

To my surprise, Patty looks dejected. “Look, I’ve tried this before. I’ll tell you what will happen. The Change Advisory Board, or CAB, will get together once or twice. And within a couple of weeks, people will stop attending, saying they’re too busy. Or they’ll just make the changes without waiting for authorization because of deadline pressures. Either way, it’ll fizzle out within a month.”

page 44

This brings my experience with attempts to establish a pragmatic, yet effective change management. My experience is that the easier a Change Management process is established, the less it actually is proactive. Means, documenting changes already done is happening better, than managing changes before work on them actually starts. Good change managers are hard to find, who really want maintain holistic perspective on a change from business, financial to technical aspects and who can stand their ground to upper management who always (!) will want to sidestep any Change Management.

Lack of cross-functional collaboration

I’ve seen this movie before. The plot is simple: First, you take an urgent date-driven project, where the shipment date cannot be delayed because of external commitments made to […] customers. Then you add a bunch of developers who use up all the time in the schedule, leaving no time for testing or operations deployment. And because no one is willing to slip the deployment date, everyone after Development has to take outrageous and unacceptable shortcuts to hit the date.

page 53

I also have seen this movie before. In an organization with decoupled engineering teams, there is always a blame game between those who come earlier in the chain than those who come later. A typical scene:

Chris replies hotly, “Don’t give me that bullshit about ‘throwing the pig over the wall.’ We invited your people to our architecture and planning meetings, but I can count on one hand the number of times you guys actually showed up. We routinely have had to wait days or even weeks to get anything we need from you guys!”
[…]
Wes rolls his eyes in frustration. “Yeah, it’s true that his people would invite us at the last minute. Seriously, who can clear their calendar on less than a day’s notice?”
[…]
I nod unhappily. This type of all-hands effort is just another part of life in IT, but it makes me angry when we need to make some heroic, diving catch because of someone else’s lack of planning.

pages 53ff, 55

When the guys in the later process steps like testers or IT operations hint at delayed deliveries from developers leading to a crunch at their end, developers bring up that they in fact early involved them. However, that involvement is often insufficient and fragmented, if it even happens in any meaningful way.

“Allspaw taught us that Dev and Ops working together, along with QA and the business, are a super-tribe that can achieve amazing things. They also knew that until code is in production, no value is actually being generated, because it’s merely WIP stuck in the system. He kept reducing the batch size, enabling fast feature flow.

page 297

Myth—DevOps Means Eliminating IT Operations, or “NoOps”: Many misinterpret DevOps as the complete elimination of the IT Operations function. However, this is rarely the case. While the nature of IT Operations work may change, it remains as important as ever. IT Operations collaborates far earlier in the software life cycle with Development, who continues to work with IT Operations long after the code has been deployed into production.
Instead of IT Operations doing manual work that comes from work tickets, it enables developer productivity through APIs and self-serviced platforms that create environments, test and deploy code, monitor and display production telemetry, and so forth. By doing this, IT Operations become more like Development (as do QA and Infosec), engaged in product development, where the product is the platform that developers use to safely, quickly, and securely test, deploy, and run their IT services in production.

page 360 (actually from The DevOps Handbook)

Simultaneously, QA, IT Operations, and Infosec are always working on ways to reduce friction for the team, creating the work systems that enable developers to be more productive and get better outcomes. By adding the expertise of QA, IT Operations, and Infosec into delivery teams and automated self-service tools and platforms, teams are able to use that expertise in their daily work without being dependent on other teams.
This enables organizations to create a safe system of work, where small teams are able to quickly and independently develop, test, and deploy code and value quickly, safely, securely, and reliably to customers. This allows organizations to maximize developer productivity, enable organizational learning, create high employee satisfaction, and win in the marketplace.

page 355 (actually from The DevOps Handbook)

Organizing teams in cross-functional fashion, is to this day still an evergreen. Its rarely done consequently enough, and if everything goes down the drain, task forces are formed where exactly the same is happening (bringing everyone together). See my Corporate SW Engineering Aphorisms.

As Randy Shoup, formerly a director of engineering at Google, observed, large organizations using DevOps “have thousands of developers, but their architecture and practices enable small teams to still be incredibly productive, as if they were a startup.”

page 378 (actually from The DevOps Handbook)

I think this is an underestimated aspect – cutting teams in product/feature verticals will only work at scale, if your system and software architecture enable according working mode.

Bottleneck Staff

A core engineer in the book is Brent. He is the go-to expert for everyone, knowing the IT systems inside out. However, this makes him the bottleneck for almost everything, from feature deployment to outage resolution.

Wes nods, “Yep. He’s the guy we need at those meetings to tell those
goddamned developers how things work in the real world and what type of
things keep breaking in production. The irony, of course, is that he can’t tell the developers, because he’s too busy repairing the things that are already broken.”
[…]
“Probably because someone like me was screaming at him, saying that I absolutely needed his help to get my most important task done. And it’s probably true: For way too many things, Brent seems to be the only one who knows how they actually work.”

page 56, page 115

“Maybe we create a resource pool of level 3 engineers to handle the escalations[…]. The level 3s would be responsible for resolving all incidents to closure, and would be the only people who can get access to Brent—on one condition. If they want to talk with Brent, they must first get Wes’ or my approval,” I say. “They’d be responsible for documenting what they learned, and Brent would never be allowed to work on the same problem twice. I’d review each of the issues weekly, and if I find out that Brent worked a problem twice, there will be hell to pay. For both the level 3s and Brent. […] Based on Wes’ story, we shouldn’t even let Brent touch the keyboard. He’s allowed to tell people what to type and shoulder-surf, but under no condition will we allow him to do something that we can’t document afterward. Is that clear?”
“That’s great,” Patty says. “At the end of each incident, we’ll have one more article in our knowledge base of how to fix a hairy problem and a growing pool of people who can execute the fix.”

page 116

Wes says, […] confirming my worst fears. “[CEO] Steve insisted that we bring in all the engineers, including Brent. He said he wanted a ‘sense of urgency’ and ‘hands on keyboards, not people sitting on the bench.’ Obviously, we didn’t do a good enough job coordinating everyone’s efforts, and…” Wes doesn’t finish his sentence.
Patty picks up where he left off, “We don’t know for sure, but at the very least, the inventory management systems are now completely down, too. […]”

page 178

He pauses and then says emphatically, “Eliyahu M. Goldratt, who created the Theory of Constraints, showed us how any improvements made anywhere besides the bottleneck are an illusion. Astonishing, but true! Any improvement made after the bottleneck is useless, because it will always remain starved, waiting for work from the bottleneck. And any improvements made before the bottleneck merely results in more inventory piling up at the bottleneck.”

page 90

I’ve also come across otherwise smart guys who are of the mistaken belief that if they hold on to a task, something only they know how to do, it’ll ensure job security. These people are knowledge Hoarders.

David Lutz, https://dlutzy.wordpress.com/2013/05/03/the-phoenix-project/

As a solution, Dr. Goldratt defined the “five focusing steps”:
– Identify the system’s constraint.
– Decide how to exploit the system’s constraint.
– Subordinate everything else to the above decisions.
– Elevate the system’s constraint.
– If in the previous steps a constraint has been broken, go back to step one, but do not allow inertia to cause a system constraint.

page 401 (actually from The DevOps Handbook)

All those money quotes highlight that a hero culture is detrimental to a mature organization. Pulling heroic actions every once in a while may seem unavoidable, but its never a sign of good management to depend on it. Use your heroes to bring kick-ass customer features in an orderly process before your competition, but dont require heroes for everyday tasks.

WIP is the silent killer

He gestures broadly with both arms outstretched, “In the 1980s, this plant was the beneficiary of three incredible scientifically-grounded management movements. You’ve probably heard of them: the Theory of Constraints, Lean production or the Toyota Production System, and Total Quality Management. Although each movement started in different places, they all agree on one thing: WIP is the silent killer. Therefore, one of the most critical mechanisms in the management of any plant is job and materials release. Without it, you can’t control WIP.”

page 89

Dominica DeGrandis, one of the leading experts on using kanbans in DevOps value streams, notes that “controlling queue size [WIP] is an extremely powerful management tool, as it is one of the few leading indicators of lead time—with most work items, we don’t know how long it will take until it’s actually completed.”

page 397 (actually from The DevOps Handbook)

Limiting Work in Progress has been one of my guiding principles for a decade. Since I am in project or line management, it has shown to be the most effective way to get a handle on any messy situation. However, its not easy at all to limit WIP. Rarely, its just about saying No often enough. More often, its about combining efforts in clever ways, breaking complex tasks down, aligning on exact requirements and expectations. But it all starts with a relentless assessment of the situation.

Stakeholder Management

Uncertain, I ask Steve, “Are we even allowed to say no? Every time I’ve asked you to prioritize or defer work on a project, you’ve bitten my head off. When everyone is conditioned to believe that no isn’t an acceptable answer, we all just became compliant order takers, blindly marching down a doomed path. I wonder if this is what happened to my predecessors, too.”

page 196

It doesn’t work without top management. If yours is continuously sidestepping any reasonably pragmatic process and ignore requests for priorization, it shows their lack of management skills, not yours (but they will make sure you feel the opposite).

Continuous Improvement

“Mike Rother says that it almost doesn’t matter what you improve, as long as you’re improving something. Why? Because if you are not improving, entropy guarantees that you are actually getting worse, which ensures that there is no path to zero errors, zero work-related accidents, and zero loss. […] Rother calls this the Improvement Kata […] He used the word kata, because he understood that repetition creates habits, and habits are what enable mastery. Whether you’re talking about sports training, learning a musical instrument, or training in the Special Forces, nothing is more to mastery than practice and drills. Studies have shown that practicing five minutes daily is better than practicing once a week for three hours. And if you want to create a genuine culture of improvement, you must create those habits.”

page 213

Like the legendary stories of the original Apple Mac OS and Netflix cloud delivery infrastructure, we deployed code that routinely created large-scale faults, thus randomly killing processes or entire servers. Of course, the result was all hell breaking loose for an entire week as our test, and occasionally, production infrastructure crashed like a house of cards. But, over the following weeks, as Development and IT Operations worked together to make our code and infrastructure more resilient to failures, we truly had IT services that were resilient, rugged, and durable.
John [Security] loved this, and started a new project called “Evil Chaos Monkey.” Instead of generating operational faults in production, it would constantly try to exploit security holes, fuzz our applications with storms of malformed packets, try to install backdoors, gain access to confidential data, and all sorts of other nefarious attacks.
Of course, Wes tried to stop this. He insisted that we schedule penetration tests into predefined time frames. However, I convinced him this is the fastest means to institutionalize Erik’s Third Way. We need to create a culture that reinforces the value of taking risks and learning from failure and the need for repetition and practice to create mastery. I don’t want posters about quality and security. I want improvement of our daily work showing up where it needs to be: in our daily work.
John’s team developed tools that stress-tested every test and production environment with a continual barrage of attacks. And like when we first released the chaos monkey, immediately over half their time was spent fixing security holes and hardening the code. After several weeks, the developers were deservedly proud of their work, successfully fending off everything that John’s team was able to throw at them.

page 329

Because we care about quality, we even inject faults into our production environment so we can learn how our system fails in a planned manner. We conduct planned exercises to practice large-scale failures, randomly kill processes and compute servers in production, and inject network latencies and other nefarious acts to ensure we grow ever more resilient. By doing this, we enable better resilience, as well as organizational learning and improvement.

page 376 (actually from The DevOps Handbook)

Using chaos spreading tools like Chaos Monkey is something we are currently exploring. I dont know (nor have I researched yet) if this is done beyond typical fuzzing approaches in embedded, but I see a lot of potential here.

Throughput

I tell them […] about how wait times depend upon resource utilization. “The wait time is the ‘percentage of time busy’ divided by the ‘percentage of time idle.’ In other words, if a resource is fifty percent busy, then it’s fifty percent idle. The wait time is fifty percent divided by fifty percent, so one unit of time. Let’s call it one hour. So, on average, our task would wait in the queue for one hour before it gets worked. “On the other hand, if a resource is ninety percent busy, the wait time is ‘ninety percent divided by ten percent’, or nine hours. In other words, our task would wait in queue nine times longer than if the resource were fifty percent idle.” I conclude, “So, for the Phoenix task, assuming we have seven handoffs, and that each of those resources is busy ninety percent of the time, the tasks would spend in queue a total of nine hours times the seven steps…”
“What? Sixty-three hours, just in queue time?” Wes says, incredulously […]

Patty says, “What that graph says is that everyone needs idle time, or slack time. If no one has slack time, WIP gets stuck in the system. Or more specifically, stuck in queues, just waiting.”

page 235f

Naturally, as the book is about lean concepts, throughput plays an important role. With the above I actually learned a new aspect. The calculation of wait time from resource utilization seems a bit counterintuitive, and I haven’t bought it fully, still. However, there is certainly a strong correlation here. Your resources – be it staff or tools – should never by occupied above a certain ratio (50%? 80%?), otherwise your full value stream will go down the drain.

Emotional and Motivational Aspects

When people are trapped in this downward spiral for years, especially those who are downstream of Development, they often feel stuck in a system that preordains failure and leaves them powerless to change the outcomes. This powerlessness is often followed by burnout, with the associated feelings of fatigue, cynicism, and even hopelessness and despair. Many psychologists assert that creating systems that cause feelings of powerlessness is one of the most damaging things we can do to fellow human beings—we deprive other people of their ability to control their own outcomes and even create a culture where people are afraid to do the right thing because of fear of punishment, failure, or jeopardizing their livelihood. This can create the conditions of learned helplessness, where people become unwilling or unable to act in a way that avoids the same problem in the future.

pages 372f (actually from The DevOps Handbook)

Never underestimate the employee mood and commitment on your organization’s performance. Another thing which sounds obvious at first, but looking behind the curtains, listening to coffee room chatter and engaging with people’s honest opinions will always (!) reveal something you need to improve one – completely outside of any hard project KPIs. Listen.

Quality and Safety

In addition to lead times and process times, the third key metric in the technology value stream is percent complete and accurate (%C/A). This metric reflects the quality of the output of each step in our value stream. Karen Martin and Mike Osterling state that “the %C/A can be obtained by asking downstream customers what percentage of the time they receive work that is ‘usable as is,’ meaning that they can do their work without having to correct the information that was provided, add missing information that should have been supplied, or clarify information that should have and could have been clearer.”

page 391 (actually from The DevOps Handbook)

Consider when we have an annual schedule for software releases, where an entire year’s worth of code that Development has worked on is released to production deployment. Like in manufacturing, this large batch release creates sudden, high levels of WIP and massive disruptions to all downstream work centers, resulting in poor flow and poor quality outcomes. This validates our common experience that the larger the change going into production, the more difficult the production errors are to diagnose and fix, and the longer they take to remediate.

page 399 (actually from The DevOps Handbook)

Dr. Sidney Dekker, who also codified some of the key elements of safety culture, observed another characteristic of complex systems: doing the same thing twice will not predictably or necessarily lead to the same result. It is this characteristic that makes static checklists and best practices, while valuable, insufficient to prevent catastrophes from occurring.

page 406 (actually from The DevOps Handbook)

Examples of ineffective quality controls include:
– Requiring another team to complete tedious, error-prone, and manual tasks that could be easily automated and run as needed by the team who needs the work performed
– Requiring approvals from busy people who are distant from the work, forcing them to make decisions without an adequate knowledge of the work or the potential implications, or to merely rubber stamp their approvals
– Creating large volumes of documentation of questionable detail which become obsolete shortly after they are written
– Pushing large batches of work to teams and special committees for approval and processing and then waiting for responses
Instead, we need everyone in our value stream to find and fix problems in their area of control as part of our daily work. By doing this, we push quality and safety responsibilities and decision-making to where the work is performed […]

page 411 (actually from The DevOps Handbook)
Kategorien
Allgemein

NVidia’s Jensen Huang praising our product

Finally the hard work of our engineering team gets into the spotlight! It was hard work and we succeeded 🦾 very motivating start of 2026!

https://www.youtube.com/clip/UgkxDgbFauNfOJpYe-GM5_CzJbA0O3FcPjd6

Great teamwork by my teams at Mercedes-Benz AG Sindelfingen, MBition, Mercedes-Benz Research and Development India and Luxoft Egypt. Thank you very much

Kategorien
Coding Tinkering

Migrating a shell to JOS

When I published my work around my own operating system JOS many moths back, my former colleague and friend Alexander suggest that in order to make my OS useful, it requires bash. Little did he (or me) know, but this lead to a one year long endeavour to get a shell working.

One bug enabler was the implementation of filesystem and disk drivers, I wrote already about it. When that was ready I started to get the source code of the bash (Bourne Again Shell), the most widely used shell. However, it turned out that porting it to another OS is pretty complex. So halfway through, I switched to dash (Debian Almquist Shell), reading somewhere on this internet that its portability is better. Less known by its name, its the default shell in Ubuntu. So I reckoned that going for it will give me a sufficiently powerful shell for long-term use.

So, how do you get a shell running on a written-from-scratch operating system? Like any other userspace program, you need to get its dependencies straight – which is you get all its mandatory standard library headers and implementations in, including additional system calls the kernel did not yet implement.

So first step was to compile (not yet link) all the dash source code against my own libc headers. This already took me long time, to get all the function declarations in place for the compile to go through. The result was pretty scary, sooo many functions with complex POSIX functionality I would have to implement! A lot of stuff I didn’t even hear about before, yet understand its inner workings.

When I went on to the linking phase, I had to provide those implementations. As I saw no way to get all of them right and implemented them in a fully correct manner, I chose to basically implement them all first in a no-op version, returning error by default:

This allowed my dash to actually compile and link and procude a dash executable. Now I could get to the juicy stuff. Of course, I couldn’t just expect all the no-op failing functions lead to a running shell. Some functions were really required to function. This phase required a lot of trial-and-error. Having reached the first $ symbol (the marker for user input), you can see that till now, many functions are not implemented, but obviously are not required for minimal functionality.

Some libc functionality could be covered on userspace side, with some workarounds. E.g. many printf variants like vsnprintf can be implemented with sprintf under the hood. Others required some additional kernel functionality. You can see in the next screenshot the syscalls I had to add (actually kill isnt used).

So finally, around christmas I was able to actually run some first shell commands, like echo or simple loops, and dash would evaluate:

Compared to running Doom, this output seems pretty underwhelming, but for me its a great milestone.

The above comes without saying, that this shell is not yet very useful. While basics are working, it is not yet capable of launching other userspace processes. I started to implement vfork and execve for this purpose and while I had some good traction in the beginning, it turned out that my kernel’s whole memory management and paging logic is just too convoluted and brittle. Hence, I set my next goal to refactor this code in order. Let’s see if and when that happens.

In case you are interested, you can find the code here: https://github.com/jbreu/jos/pull/34

Kategorien
Culture

The best mail filter rule in the world

As someone who is using many web services and also contributing there, many automated notifications accumulate over time. Jira Ticket updates, newly created Confluence pages in a folder I observe, Gitlab Merge Request status updates, … there is so much going on, and I personally like to read along whenever I have some spare minutes. So, turning off all those notifications is not an option for me. On the other hand, I would get hundreds of such mails to my inbox every day.

So, the first rather obvious step was to add a filter/rule, which moves all mail coming from automailers like jira-no-reply@foo.bar.com to a Notifications folder:

That works in a pretty straightforward manner. Makes my inbox much cleaner and I can browse the Notifications folder whenever I want. However, now I may miss especially relevant updates. How to find those?

Turns out, a good heuristic is to just use my own name as an indicator for relevance (at least for myself haha). So, I do not move any mail which contains my first name or my internal user id → those stay in my main inbox!

Of course, besides using your own name (or mine 😉) you can use other terms which indicate relevance.

This allows me to get all notifications, have the most relevant ones in my main inbox and, thus, stay on top of whats going on without drowning in internal update spam.

Kategorien
Allgemein

Aligning Test Ecosystems

(Disclaimer: As usual, I am writing here from the perspective of my personal, automotive embedded sw perspective. Below are my thoughts based on everyday-experience as an engineering manager and sw enthusiast. I am not a trained/certified test expert with deep knowledge in testing theory/science, frameworks and terminology. Please take it with a grain huge truckload of salt. Let me know your opinion!)

Introduction

After more than 17 years in automotive sw engineering, I have come to see testing as an integral part of everyday sw engineering practice. At first, such a statement seems obvious – it doesnt need 17 years to understand the importance of testing. True, but read the italic sentence again. I wrote „part of everyday […] practice“. To a suprise for some, and bitterly confirming others, this is not the case up until today in many automotice sw projects. Testing is often decoupled in various dimensions:

  • Testing is done much later than the code implementation (weeks, months, years)
  • Testing is executed when big SW releases are made (think: on a monthly or quarterly scale)
  • Testing is done to satisfy some process requirements (again, usually late, briefly before start of production)
  • Testing is done by dedicated staff in testing organizations, spacially and organizationally decoupled from the other sw roles (architects, programmers, Product Owners)

Even when talking about the lowest tier of sw testing, unit testing and coding rule compliance, much if not all of the above applies. When it comes to higher levels of testing such as integration testing or qualification testing, more and more of the above list applies.

There are many reasons why the above is bad and why it is the case. Many such issues are based on wrong mindset and corporate organizational dynamics (e.g. the infamous „make testers a shared resource decoupled from dev teams“). Those are causes I dont want to write about today. I have done so in the past and may do in the future. My focus today is on the technical meta aspects of getting testing into the right spot.

My observation is that automotive sw testing is still in a severe lack of alignment/harmonization of the test environments. While there are many sophisticated tools serving specific purposes, the overall landscape is fragmented. Many expensive, powerful domain-specific tools are combined with home-grown tooling, resulting in test ecosystems which work all by themselves, and not leveraged for cross-domain or cross-project synergies.. Now, everyone always strives to harmonize and standardize solutions, but seldom it works out. There are a lot of organizational inertia and technical challenges, so just demanding or proclaiming a standardization often does not happen at all, or if it does, it leads to situations like my all-time favorite xkcd refers to:

https://xkcd.com/927/

However, that doesn’t mean we should die before the battle. We should be clever about our approach, and balance effort vs. (potential) value. And as we know the test ecosystem is complex, we need to make up our mind where to invest. Just throwing money and hours at „testing“ will not make anything better.

A Mental Model of Testing Aspects

In order to see where harmonization gives the biggest bang for the buck, first we need to understand how modern testing is actually cut into pieces. I did some research and was pretty surprised that there isn’t any good model or taxonomy of testing yet described. Probably I just couldn’t find it. What one can find many times are categorizations of test types, like found here. The ISO/IEC/IEEE 29119-1 software testing standards also doesn’t really provide such a model. There is a „multi-layered test context diagram“, but it falls short in the areas where I think alignment can happen the best.

ISO/IEC/IEEE 29119-1, page 16

So without further ado, lets dive into my personal mental meta testing model, going from top to bottom or from outside in. And what opportunities to align and harmonize do each offer.

1. Test Intent and Scope (aka Test Strategy)

Scope: The test strategy serves to define which quality attributes are being validated by testing in a project. It can go from the classics functional correctness, interface behavior (integration testing), regression avoidance, performance to reliability, security, safety, compliance, usability, compatibility, and so on. An obvious demand by ASPICE and the like, a test strategy is often at the core of each process audit.

Pragmatic opportunities for cross-project alignment: Having contributed to test strategies written from (almost) scratch multiple times and seeing all the blood, sweat and tears flowing into it and, especially, refining and establishing it, I think the greatest opportunity here is to provide templates and knowledge:

  • A common template gives the responsibles a skeleton which allows them to fill in the needed parts, and all „empty“ sections are a checklist.
  • Besides a template, some example test strategies which passed audits are extremely helpful. If you are like me, you often wonder in which depth one shall go for a certain aspect. Shall I write a oneliner or does it require multiple pages of text with nicely crafted graphics? Examples can help to find the sweet spot of „just enough“.
  • Talking about examples, best practices are extremely valuable and a shared body of knowledge can bring a struggling organization from zero to hero. All it requires is to overcome the not-invented-here syndrom. There is so much good stuff out there, even in your org. You just gonna have to find it.
  • While the test strategy can be authored with any reasonably capable text editing format (Microsoft Word, Atlassian Confluence, …), what I have come to appreciate is using lightweight markup languages like Markdown or Sphinx. They enable handling the document as code, version track it and, ideally, even maintain alongside the actual project code.
2. Test Case Specification / Definition

Scope: Defines what the test is in an abstract sense. Each test case is described in a structure, which contains preconditions, input/stimuli, and expected outputs. Here, tracability to requirements, risks, code, variants and equivalence classes are documented. Last, but operationally very relevant are the metadata for each test case, like its priority, tags, ownership, status, etc.

Pragmatic opportunities for cross-project alignment:

  • Test cases should be maintained in an accessible database. Like any good software culture, the test cases should be as widely shared among testers from different projects as code is shared between software developers. As above, I think such a database is ideally done via plain text formats. If this is not an option, make sure to get your tester’s and developer’s input on the proper tool choice. A bad tool choice can severely impact testing operations and cost a lot of sunk money – I can speak from experience.
  • Test cases should be written with as low as possible depedency on following process steps (like test frameworks and test tooling). This makes them reusable, but also decouples them from their implementation and execution, allowing for an exchange in the tooling.
  • The representation of each test case should be via structured text. Natural language can be used in comments, but only structured text can be parsed in precise manner (LLMs could change this on the long run, but not yet).
  • Establish a common language. There are great options out there. From the Behavior-Driven Design culture we got Gherkin or the Robot Framework test data syntax.
3. Test Implementation / Binding

Scope: Maps abstract test case specification to executable behavior. Here it usually gets very tooling specific. Here is, where APIs are implemented and keywords are bound to code (e.g. a „connect“ from step 2 is mapped to „ssh root@192.168.0.1“). A lot of efforts are spent on test setup and teardown to prepare and cleanup.

Pragmatic opportunities for cross-project alignment:

  • A strong culture of reusable test implementation needs to be established here. E.g. setting up the test environment can – and often is done – via very hacky means. Using established and well-maintained patterns helps.
  • Define an explicit „test contract“: For each test, or your whole test set/suite, explicit state which guarantees they can uphold:
    Test X guarantees:
    - deterministic result under condition Y
    - idempotent execution
    - no persistent state
  • Keep Environment Knowledge Out of Tests: Tests should not know URLs, credentials, ports, hardware topology
  • Some deeper technical topics:
    • Invert control of assertions and actions. Instead of hard-coding test tooling specific instructions like
      assert response.status_code == 200 # pytest-specific semantics
      use an abstraction to define what a positive response means:
      test.assert.status_ok(response) # test.assert.* is our own abstraction
    • Don’t import test environment specifics in your immediate test implementation, only in the adapters.
    • Seperate the „What“ from the „How“ using capability interfaces.
    • Make test steps declarative, not procedural. Good example:
      - action: http.request
      method: GET
      url: /health
      expect:
      status: 200
    • Centrealize test utilities as products, not helpers. Maintain them with semantic versioning.
    • Decouple test selection from test code. Instead of
      pytest tests/safety/test_error_management.py
      prefer
      run-tests --tag safety --env TEST
      This is extremely powerful, as it enables test selection in a metadata-driven fashion.

Ultimately, accept that 100% tool independence is a myth — design for easy replaceability. The goal is not “tool-free tests”, but cheap tool replacement. Success criteria could be that tool-specific code <10–20% of total test code, switching a runner requires new adapter, not rewriting tests and new project reuses 70–90% of test logic.

4. Test Framework

Scope: The test framework provides the strcuture for test implementation. It does so by managing the test execution lifecycle, providing fixtures (a fixture sets up preconditions for a test and tears them down afterward), enables mechanisms for parameterization and provides reporting hooks. It can also manage failure semantics, like retry on fail or handling flakiness. There are many test frameworks out there, some of them somewhat locked in to a specific programming language and test level (pytest, JUnit, gtest), others are versatile and independent (Cucumber, Robot Framework).

Pragmatic opportunities for cross-project alignment: In my experience, there are far too many test framework in use in multi-project organizations. Often, historical developments have led test frameworks being adopted. On top, test frameworks are often in-house developed as either existing ones are unknown or lacking needed functionality (or not-invented-here hits). The opportunity here is both obvious, yet hard to retrieve: Forcing multiple projects/sub-organizations to drop their beloved test framework in order to adopt another one defined centrally will cause a lot of frustration and lead to temporary loss of pace. On the other hand, strategically, it makes total sense to nourish an iterative harmonization. If you want to go for this path, I recommend to chose either an open source solution or a home-grown solution managed in an inner source fashion. While there may be attractive and powerful proprietary solutions on the market, by experience such will always have a harder time for long-term adoption. Topics like license cost, license management, improvements will always be a toll, causing friction and making dev teams seek for other solutions, again.

Adding some highly personal opinion, any test framework which wants to prevail on the long-term has to have at least following properties:

  • Run on Linux
  • Run in headless fashion (no clunky GUI)
  • Fully automatable
  • Configuration as code
  • No individual license per seat (so either license free, or global, company-wide license)

The above properties are not meant to be exclusive, if a tool also supports Windows and has an optional GUI, thats ok. But there is no future in that, and it actively hinders adoption.

5. Test Runner

Scope: Executes tests in a concrete process. A test runner can resolve the execution order (e.g. if test Y depends on test X), parallelize tests on one test target (if possible), manage timeouts, isolates processes and collects the test results. Runners are often stateless executors.

Pragmatic opportunities for cross-project alignment: Test frameworks often (always?) bring their test runners with them. The same observation from the Test Framework section applies here: instead of tightly coupling runners to frameworks, consider decoupled runners using containerization.

6. Test Environment

Scope: Defines where the tests run. While the test runner is typically just a tool which executes instructions on a test environment, the test environment itself defines the test execution environment: Which product targets (ECU variant X, Hardware revision Y, …), Simulators, cloud vs. on-premise, network topology, test doubles (mocks, stubs). Important aspects to cover here are provisioning (how is test software deployed = flashed?), which configuration needs to be applied to get to a running test environment, lifecycle management.

Pragmatic opportunities for cross-project alignment: Managing test environments can be very domain-specific. Often its done by documentary means, like big Excel tables containing variant information about a ECU test fleet in the lab. Applying infrastructure as code solutions like Ansible can help to create reproducible, scalable and reliable test environments. Going one step further, one may adopt managed test fleets like AWS Test Farm or use solutions like OpenDUT.

7. Test Orchestration

Scope: Coordinates what runs, when, and where. I think test orchestration is an underestimated portion of the whole. Often people don’t even realize its existence. In its simplest form, a nightly cronjob or jenkins pipeline which runs executes some tests is a test orchestration. However, a pretty unsophisticated one. Test orchestration in a more capable form can:

  • Manually/automatically select tests (by tags, risk, code change, failure history)
  • Distribute execution to multiple test runners/targets
  • Handle dependencies (above individual test case scale, see test runner)
  • Schedule test execution
  • Automatically quarantine test runners/targets (e.g. in case of failure)

Test orchestration is a central coordination layer, and therefore, extremely important in diverse tool landscape, as it enables tool mixing.

As mentioned, often CI pipelines are used for test orchestration. Another solution is LAVA. And of course, again, your choice of in-house solution.

Pragmatic opportunities for cross-project alignment: I would say, a test orchestration solution is nothing which should be harmonized with priority. Its better to focus on other parts of this model for alignment efforts. Why? While test orchestration is important and extremely helpful, having same test orchestration across many projects is rather the endgame than the starting phase. Having that said, if the testing community can agree on a common test orchestration approach early on, why not.

8. Test Failure Triage / Debugging

Scope: A grayzone in the scope of testing is the handle of failures beyond merely reporting them. While I would agree, its not formally part of testing scope, it certainly requires tester contribution in a cross-functional engineering team.

Pragmatic opportunities for cross-project alignment: While debugging a specific bug tends to be pretty project-specific, I still see opportunities for the sharing of knowledge and best practices how exactly to combine testing with debugging efforts:

  • How to make logs and traces available. I am seeing test activities which barely provide no logs about a failure (one extreme) to a plethora of logs potentially overwhelming the debugger (other extreme). Finding the right balance in between those extremes requires experience and experimentation. Cross-project best practice sharing can improve the maturity curve here.
    • Accessibility of logs is very important. There is a difference if all the logs are well accessible with one link, nicely structured in an overview, or if the logs have to be pulled via 4 different storage services with different user access mechanisms.
  • LLMs seem to have a lot of potential. We are still in an early phase here, and its clear that beyond trivial bugs* you cannot just throw logs at an LLM and expect it find the root cause. It will likely find a lot of potential causes, e.g. because its lacking about expected misleading log messages which were always there. Context is key, e.g. by augmenting the prompts with Retrieval Augmented Generation (RAG), Model Contex Protocol connections to tools (MCP) and providing a body of expert knowledge („ignore error message XY, its always there“).
  • * talking about trivial bugs, every software engineer incl tester knows that a majority of bug tickets is, in fact, trivial in some sense. There are duplicates, flukes, layer 8 problems, known limitations, etc. Wading through those is a major source of frustration and conflict, and LLMs have a huge potential to help here.
9. Reporting, Analysis & Feedback

Scope: Handles what happens after execution. Here, test results are aggregated, trends are analyzed, coverage metrics are put together, failures are clustered, flakiness statistics are gathered. Here, management, process customers and project-external customers are coming in as consumers, additional to the (often neglected) internal engineering workforce. While former layers detect failures, this one interprets them and puts them in context.

Pragmatic opportunities for cross-project alignment: The good thing is, that on this level we are clearly outside the area where project-specific tooling is required. Standardization here doesn’t (or does barely) constrain testing tooling. Hence, this is one lowest risk, highest return-on-investment layer for alignment.

Obviously, the way the reports are aggregated can be aligned in a cross-project manner. Dashboards which show cross-project data yet are possible to be filtered down to each project’s or sub-project’s slice can be very instructive for (project) management.
If we exclude Defect Management but focus on actual test results, a precondition is to have aligned terminology, taxonomy and understanding of testing across projects, closing the loop to section Test Intent and Scope (aka Test Strategy).

Naturally, the focus of consumers will be on failed test cases more than passed ones. A unified failure taxonomy can contribute to align, too: Whether it is a product defect, a test tool defect (not relevant for customers), data defect, environment defect, flakiness, timing/performance degradation can make a huge difference.

It helps to agree on definitions, not thresholds first: What is a pass rate, how are process time intervals like „mean time to detect“, „mean time to repair“ defined, how are flakiness rates determined and coverages calculated.

Closing words

I have been pondering about this topic for many months, and I am glad I finally have put all my thoughts in words and a graphic:

I am looking forward to feedback. I am sure my evaluations are not globally agreeable, so I would be happy to enter a constructive discussion.

Kategorien
Coding

JOS Updates: Filesystem, Disk Driver, 4KB pages, Automated Testing

I am happy to announce that my own operating system written from scratch JOS got some updates in recent months. Since the last update I was able to extend two essential capabilities to progress towards my big next goal (running a somewhat useful shell): Filesystem and Disk Driver. As usual, both are very rudimentary and incomplete. At the moment only read operations are possible, and even those are very limited. Still, I am relieved that this is working now. Both in combination enable me to drop the workaround that files have been stored directly in the binary of my OS.

As fileystem I went for Ext2, on older very simple filesystem which was (and is via its successors Ext3 and Ext4) in use by many if not most Linux systems. For the disk driver I am also going for on oldie: ATA. In the end, the code for both together is only approx 500 lines. I had actually procrastinated this for a while because I was awed by the expected complexity. While both caused some frustration loops in the end it clicked faster than expected.

I am even surprised that the performance is better than what I have expected from an unoptimized first working iteration. As you can see in the tracing view one exemplary file read syscall from userspace (still Doom 😉 ) is taking less than one 1 millisecond, and reading one sector (512 bytes) from the disk is taking 100 microseconds.

What actually has cost me way more time and nerves during the last months has been a refactoring without any clear purpose (and in hindsight probably could have been skipped): I enabled JOS to run with 4 KB pages. The page size is the fundamental unit used by the memory management unit of the CPU to organize memory. For historic reasons I have been using 2 MB pages. While 2 MB work fine, I guessed that this would lead too to less granularity. E.g. I was researching into IPC mechanisms, specifically via shared memory. And if each process has to share memory in 2 MB chunks the RAM quickly would be too full. So yeah, now its running with 4 KB pages size. Theoretically 2 MB still work and could be enabled as a compile time switch. However, I was too happy that 4 KB finally worked so I didnt find the mojo yet to test-and probably fix-2MB.

Speaking of testing: The automated tests I introduced in my last brief JOS update really paid off big time. I could check any change within a few seconds of automated tests and ensure nothing worsened. As in the end all above improvements didnt directly enable any new functionality, a lot improved under the hood and Doom is still running.

Kategorien
Coding Culture

Corporate SW Engineering Aphorisms

During my recent vacation I was reflecting on patterns I observe in my 15+ tenure in a corporate SW engineering environment. My friends and colleagues know me to be very interested in the meta-level of organizational dynamics, my blog is evidence for this.

Its not so easy (for me) to communicate such patterns in a thought-provoking manner. The internet and software culture offers a very rich collection of much more clever people’s takes. You probably have heard about Murphy’s Law, Conway’s Law, or Parkinson’s Law of Triviality. https://matthewreinbold.com/2020/08/03/Technology-Aphorisms has a nice collection of those. However, phrasing laws is a but too much for my humble self. Instead, I figured aphorisms are more apprioriate for my opinions. However, I have to admit that I cant perfectly differentiate between aphorism, sententia, maxim, Aperçu, and bonmot. I guess this is just me trying to be clever, too 🙂

In task forces you bring together end-to-end teams and miraculously it works. Why do you wait for the task force to form such teams?

Great organizations mature and evolve their software over many years. Others replace it every other year – and call this progress.

While the strategy grows on shiny slides, the engineers wonder who still listens to them.

In a world of infinite content, silence becomes signal

It’s not the traffic that breaks the system — it’s the architect’s fantasy of it.

In the world of junior architects, no problem is too small for an oversized solution.

Overengineering doesn’t solve problems — it staffs them.

Complexity is job security — for the team that caused it.

Not every repetition is a problem. Some abstractions are worse.

YAGNI is no excuse for architecture – but a compass for its necessity

Every new DSL saves you five keystrokes – and costs you 3 days of debugging

Kategorien
Culture

Business Trip to Egypt

Few weeks back my colleagues Kemal Hajvazovic, Seif Abdelmegeed and me had the chance to visit our partners from Luxoft Egypt in Cairo. It was great to meet the team which is helping us on our ADAS platform software journey in many regards. Great culture and spirit – keep it up!

Thanks for hosting us Fatmaelzahraa Mohamed & Amr Hussein Taher

Kategorien
Coding Tinkering

JOS is on fire 🔥

During a long weekend my own operating system has made some progress:

– JOS is now able to read data from ext2 filesystem images
– Via a new serial interface, automated smoke tests can be executed in the Github pipeline to check some essential features are always working
– I played around with flamegraphs, which show the call stacks (including userspace and kernelspace)

check it out at https://github.com/jbreu/jos

Kategorien
Coding Culture

Technology Radar #32: Automotive SW perspective

Few days ago version #32 of Thoughtworks‘ (TW) Technology Radar has been published. As in earlier blog posts, I want to review the topics in there from the perspective of automotive embedded SW engineering. As usual, there is some bias towards cloud technology and machine learning which is out of my current professional scope (exceptions apply), however there are enough other concepts/tools/aspects every time which make me investigate and followup to my best possibilities either at work or in my private projects. In this blog post I will try to list those parts which are roughly new and relevent from an automotive software perspective.

Lets start with the Technology sector. First item TW recommends to adopt is Fuzz Testing. Indeed its a testing approach with great potential I have barely ever leveraged (I am not alone: „Fuzz testing , or simply fuzzing, is a testing technique that has been around for a long time but it is still one of the lesser-known techniques“). Worth noting: Google has an interesting project called OSS-Fuzz in which they fuzz open source projects for free, and find a lot of issues actually. Fuzz Testing is on my top 10 sw engineering practices I want to see in real project life as soon as possible.

The next interesting item „API request collection as API product artifact“ sounds a bit clunky. I interprete it as a set of sample API requests which help developers to more quickly adopt APIs inside of an ecosystem. That is definetly desirable, as examples are often more helpful in getting the hang of a specific API than its API documentation (not to mentioned that doc is still very important, too). One caveat is to establish ways to keep the examples/collection up-to-date so they dont break after a while when the API evolves.

Then comes Architecture advice process, which resonates very well with current experience: In large software projects, a common challenge is the architectural alignment processes. Traditional methods, like Architecture Review Boards, often slow things down and are linked to poor organizational outcomes. A more effective alternative may be the architectural advice process—a decentralized model where anyone can make architectural choices, as long as they consult with affected stakeholders and experts. This approach supports faster workflows without sacrificing quality, even at scale. Though it may initially seem risky, tools like Architecture Decision Records and advisory groups help keep decisions well-informed. This model is proving successful, even in tightly regulated industries. Andrew Harmel-Law has written an insightful blog post on it.

In the Tools sector, uv is getting a spotlight. uv is the hot shit right now in the Python ecosystem. While I have not used it myself, I see it gradually replacing other Python package managers. This is due to its fast execution, but also well designed features, making it easier to run kind of self-contained Python projects.