Kategorien
Allgemein

NVidia’s Jensen Huang praising our product

Finally the hard work of our engineering team gets into the spotlight! It was hard work and we succeeded 🦾 very motivating start of 2026!

https://www.youtube.com/clip/UgkxDgbFauNfOJpYe-GM5_CzJbA0O3FcPjd6

Great teamwork by my teams at Mercedes-Benz AG Sindelfingen, MBition, Mercedes-Benz Research and Development India and Luxoft Egypt. Thank you very much

Kategorien
Allgemein

Aligning Test Ecosystems

(Disclaimer: As usual, I am writing here from the perspective of my personal, automotive embedded sw perspective. Below are my thoughts based on everyday-experience as an engineering manager and sw enthusiast. I am not a trained/certified test expert with deep knowledge in testing theory/science, frameworks and terminology. Please take it with a grain huge truckload of salt. Let me know your opinion!)

Introduction

After more than 17 years in automotive sw engineering, I have come to see testing as an integral part of everyday sw engineering practice. At first, such a statement seems obvious – it doesnt need 17 years to understand the importance of testing. True, but read the italic sentence again. I wrote „part of everyday […] practice“. To a suprise for some, and bitterly confirming others, this is not the case up until today in many automotice sw projects. Testing is often decoupled in various dimensions:

  • Testing is done much later than the code implementation (weeks, months, years)
  • Testing is executed when big SW releases are made (think: on a monthly or quarterly scale)
  • Testing is done to satisfy some process requirements (again, usually late, briefly before start of production)
  • Testing is done by dedicated staff in testing organizations, spacially and organizationally decoupled from the other sw roles (architects, programmers, Product Owners)

Even when talking about the lowest tier of sw testing, unit testing and coding rule compliance, much if not all of the above applies. When it comes to higher levels of testing such as integration testing or qualification testing, more and more of the above list applies.

There are many reasons why the above is bad and why it is the case. Many such issues are based on wrong mindset and corporate organizational dynamics (e.g. the infamous „make testers a shared resource decoupled from dev teams“). Those are causes I dont want to write about today. I have done so in the past and may do in the future. My focus today is on the technical meta aspects of getting testing into the right spot.

My observation is that automotive sw testing is still in a severe lack of alignment/harmonization of the test environments. While there are many sophisticated tools serving specific purposes, the overall landscape is fragmented. Many expensive, powerful domain-specific tools are combined with home-grown tooling, resulting in test ecosystems which work all by themselves, and not leveraged for cross-domain or cross-project synergies.. Now, everyone always strives to harmonize and standardize solutions, but seldom it works out. There are a lot of organizational inertia and technical challenges, so just demanding or proclaiming a standardization often does not happen at all, or if it does, it leads to situations like my all-time favorite xkcd refers to:

https://xkcd.com/927/

However, that doesn’t mean we should die before the battle. We should be clever about our approach, and balance effort vs. (potential) value. And as we know the test ecosystem is complex, we need to make up our mind where to invest. Just throwing money and hours at „testing“ will not make anything better.

A Mental Model of Testing Aspects

In order to see where harmonization gives the biggest bang for the buck, first we need to understand how modern testing is actually cut into pieces. I did some research and was pretty surprised that there isn’t any good model or taxonomy of testing yet described. Probably I just couldn’t find it. What one can find many times are categorizations of test types, like found here. The ISO/IEC/IEEE 29119-1 software testing standards also doesn’t really provide such a model. There is a „multi-layered test context diagram“, but it falls short in the areas where I think alignment can happen the best.

ISO/IEC/IEEE 29119-1, page 16

So without further ado, lets dive into my personal mental meta testing model, going from top to bottom or from outside in. And what opportunities to align and harmonize do each offer.

1. Test Intent and Scope (aka Test Strategy)

Scope: The test strategy serves to define which quality attributes are being validated by testing in a project. It can go from the classics functional correctness, interface behavior (integration testing), regression avoidance, performance to reliability, security, safety, compliance, usability, compatibility, and so on. An obvious demand by ASPICE and the like, a test strategy is often at the core of each process audit.

Pragmatic opportunities for cross-project alignment: Having contributed to test strategies written from (almost) scratch multiple times and seeing all the blood, sweat and tears flowing into it and, especially, refining and establishing it, I think the greatest opportunity here is to provide templates and knowledge:

  • A common template gives the responsibles a skeleton which allows them to fill in the needed parts, and all „empty“ sections are a checklist.
  • Besides a template, some example test strategies which passed audits are extremely helpful. If you are like me, you often wonder in which depth one shall go for a certain aspect. Shall I write a oneliner or does it require multiple pages of text with nicely crafted graphics? Examples can help to find the sweet spot of „just enough“.
  • Talking about examples, best practices are extremely valuable and a shared body of knowledge can bring a struggling organization from zero to hero. All it requires is to overcome the not-invented-here syndrom. There is so much good stuff out there, even in your org. You just gonna have to find it.
  • While the test strategy can be authored with any reasonably capable text editing format (Microsoft Word, Atlassian Confluence, …), what I have come to appreciate is using lightweight markup languages like Markdown or Sphinx. They enable handling the document as code, version track it and, ideally, even maintain alongside the actual project code.
2. Test Case Specification / Definition

Scope: Defines what the test is in an abstract sense. Each test case is described in a structure, which contains preconditions, input/stimuli, and expected outputs. Here, tracability to requirements, risks, code, variants and equivalence classes are documented. Last, but operationally very relevant are the metadata for each test case, like its priority, tags, ownership, status, etc.

Pragmatic opportunities for cross-project alignment:

  • Test cases should be maintained in an accessible database. Like any good software culture, the test cases should be as widely shared among testers from different projects as code is shared between software developers. As above, I think such a database is ideally done via plain text formats. If this is not an option, make sure to get your tester’s and developer’s input on the proper tool choice. A bad tool choice can severely impact testing operations and cost a lot of sunk money – I can speak from experience.
  • Test cases should be written with as low as possible depedency on following process steps (like test frameworks and test tooling). This makes them reusable, but also decouples them from their implementation and execution, allowing for an exchange in the tooling.
  • The representation of each test case should be via structured text. Natural language can be used in comments, but only structured text can be parsed in precise manner (LLMs could change this on the long run, but not yet).
  • Establish a common language. There are great options out there. From the Behavior-Driven Design culture we got Gherkin or the Robot Framework test data syntax.
3. Test Implementation / Binding

Scope: Maps abstract test case specification to executable behavior. Here it usually gets very tooling specific. Here is, where APIs are implemented and keywords are bound to code (e.g. a „connect“ from step 2 is mapped to „ssh root@192.168.0.1“). A lot of efforts are spent on test setup and teardown to prepare and cleanup.

Pragmatic opportunities for cross-project alignment:

  • A strong culture of reusable test implementation needs to be established here. E.g. setting up the test environment can – and often is done – via very hacky means. Using established and well-maintained patterns helps.
  • Define an explicit „test contract“: For each test, or your whole test set/suite, explicit state which guarantees they can uphold:
    Test X guarantees:
    - deterministic result under condition Y
    - idempotent execution
    - no persistent state
  • Keep Environment Knowledge Out of Tests: Tests should not know URLs, credentials, ports, hardware topology
  • Some deeper technical topics:
    • Invert control of assertions and actions. Instead of hard-coding test tooling specific instructions like
      assert response.status_code == 200 # pytest-specific semantics
      use an abstraction to define what a positive response means:
      test.assert.status_ok(response) # test.assert.* is our own abstraction
    • Don’t import test environment specifics in your immediate test implementation, only in the adapters.
    • Seperate the „What“ from the „How“ using capability interfaces.
    • Make test steps declarative, not procedural. Good example:
      - action: http.request
      method: GET
      url: /health
      expect:
      status: 200
    • Centrealize test utilities as products, not helpers. Maintain them with semantic versioning.
    • Decouple test selection from test code. Instead of
      pytest tests/safety/test_error_management.py
      prefer
      run-tests --tag safety --env TEST
      This is extremely powerful, as it enables test selection in a metadata-driven fashion.

Ultimately, accept that 100% tool independence is a myth — design for easy replaceability. The goal is not “tool-free tests”, but cheap tool replacement. Success criteria could be that tool-specific code <10–20% of total test code, switching a runner requires new adapter, not rewriting tests and new project reuses 70–90% of test logic.

4. Test Framework

Scope: The test framework provides the strcuture for test implementation. It does so by managing the test execution lifecycle, providing fixtures (a fixture sets up preconditions for a test and tears them down afterward), enables mechanisms for parameterization and provides reporting hooks. It can also manage failure semantics, like retry on fail or handling flakiness. There are many test frameworks out there, some of them somewhat locked in to a specific programming language and test level (pytest, JUnit, gtest), others are versatile and independent (Cucumber, Robot Framework).

Pragmatic opportunities for cross-project alignment: In my experience, there are far too many test framework in use in multi-project organizations. Often, historical developments have led test frameworks being adopted. On top, test frameworks are often in-house developed as either existing ones are unknown or lacking needed functionality (or not-invented-here hits). The opportunity here is both obvious, yet hard to retrieve: Forcing multiple projects/sub-organizations to drop their beloved test framework in order to adopt another one defined centrally will cause a lot of frustration and lead to temporary loss of pace. On the other hand, strategically, it makes total sense to nourish an iterative harmonization. If you want to go for this path, I recommend to chose either an open source solution or a home-grown solution managed in an inner source fashion. While there may be attractive and powerful proprietary solutions on the market, by experience such will always have a harder time for long-term adoption. Topics like license cost, license management, improvements will always be a toll, causing friction and making dev teams seek for other solutions, again.

Adding some highly personal opinion, any test framework which wants to prevail on the long-term has to have at least following properties:

  • Run on Linux
  • Run in headless fashion (no clunky GUI)
  • Fully automatable
  • Configuration as code
  • No individual license per seat (so either license free, or global, company-wide license)

The above properties are not meant to be exclusive, if a tool also supports Windows and has an optional GUI, thats ok. But there is no future in that, and it actively hinders adoption.

5. Test Runner

Scope: Executes tests in a concrete process. A test runner can resolve the execution order (e.g. if test Y depends on test X), parallelize tests on one test target (if possible), manage timeouts, isolates processes and collects the test results. Runners are often stateless executors.

Pragmatic opportunities for cross-project alignment: Test frameworks often (always?) bring their test runners with them. The same observation from the Test Framework section applies here: instead of tightly coupling runners to frameworks, consider decoupled runners using containerization.

6. Test Environment

Scope: Defines where the tests run. While the test runner is typically just a tool which executes instructions on a test environment, the test environment itself defines the test execution environment: Which product targets (ECU variant X, Hardware revision Y, …), Simulators, cloud vs. on-premise, network topology, test doubles (mocks, stubs). Important aspects to cover here are provisioning (how is test software deployed = flashed?), which configuration needs to be applied to get to a running test environment, lifecycle management.

Pragmatic opportunities for cross-project alignment: Managing test environments can be very domain-specific. Often its done by documentary means, like big Excel tables containing variant information about a ECU test fleet in the lab. Applying infrastructure as code solutions like Ansible can help to create reproducible, scalable and reliable test environments. Going one step further, one may adopt managed test fleets like AWS Test Farm or use solutions like OpenDUT.

7. Test Orchestration

Scope: Coordinates what runs, when, and where. I think test orchestration is an underestimated portion of the whole. Often people don’t even realize its existence. In its simplest form, a nightly cronjob or jenkins pipeline which runs executes some tests is a test orchestration. However, a pretty unsophisticated one. Test orchestration in a more capable form can:

  • Manually/automatically select tests (by tags, risk, code change, failure history)
  • Distribute execution to multiple test runners/targets
  • Handle dependencies (above individual test case scale, see test runner)
  • Schedule test execution
  • Automatically quarantine test runners/targets (e.g. in case of failure)

Test orchestration is a central coordination layer, and therefore, extremely important in diverse tool landscape, as it enables tool mixing.

As mentioned, often CI pipelines are used for test orchestration. Another solution is LAVA. And of course, again, your choice of in-house solution.

Pragmatic opportunities for cross-project alignment: I would say, a test orchestration solution is nothing which should be harmonized with priority. Its better to focus on other parts of this model for alignment efforts. Why? While test orchestration is important and extremely helpful, having same test orchestration across many projects is rather the endgame than the starting phase. Having that said, if the testing community can agree on a common test orchestration approach early on, why not.

8. Test Failure Triage / Debugging

Scope: A grayzone in the scope of testing is the handle of failures beyond merely reporting them. While I would agree, its not formally part of testing scope, it certainly requires tester contribution in a cross-functional engineering team.

Pragmatic opportunities for cross-project alignment: While debugging a specific bug tends to be pretty project-specific, I still see opportunities for the sharing of knowledge and best practices how exactly to combine testing with debugging efforts:

  • How to make logs and traces available. I am seeing test activities which barely provide no logs about a failure (one extreme) to a plethora of logs potentially overwhelming the debugger (other extreme). Finding the right balance in between those extremes requires experience and experimentation. Cross-project best practice sharing can improve the maturity curve here.
    • Accessibility of logs is very important. There is a difference if all the logs are well accessible with one link, nicely structured in an overview, or if the logs have to be pulled via 4 different storage services with different user access mechanisms.
  • LLMs seem to have a lot of potential. We are still in an early phase here, and its clear that beyond trivial bugs* you cannot just throw logs at an LLM and expect it find the root cause. It will likely find a lot of potential causes, e.g. because its lacking about expected misleading log messages which were always there. Context is key, e.g. by augmenting the prompts with Retrieval Augmented Generation (RAG), Model Contex Protocol connections to tools (MCP) and providing a body of expert knowledge („ignore error message XY, its always there“).
  • * talking about trivial bugs, every software engineer incl tester knows that a majority of bug tickets is, in fact, trivial in some sense. There are duplicates, flukes, layer 8 problems, known limitations, etc. Wading through those is a major source of frustration and conflict, and LLMs have a huge potential to help here.
9. Reporting, Analysis & Feedback

Scope: Handles what happens after execution. Here, test results are aggregated, trends are analyzed, coverage metrics are put together, failures are clustered, flakiness statistics are gathered. Here, management, process customers and project-external customers are coming in as consumers, additional to the (often neglected) internal engineering workforce. While former layers detect failures, this one interprets them and puts them in context.

Pragmatic opportunities for cross-project alignment: The good thing is, that on this level we are clearly outside the area where project-specific tooling is required. Standardization here doesn’t (or does barely) constrain testing tooling. Hence, this is one lowest risk, highest return-on-investment layer for alignment.

Obviously, the way the reports are aggregated can be aligned in a cross-project manner. Dashboards which show cross-project data yet are possible to be filtered down to each project’s or sub-project’s slice can be very instructive for (project) management.
If we exclude Defect Management but focus on actual test results, a precondition is to have aligned terminology, taxonomy and understanding of testing across projects, closing the loop to section Test Intent and Scope (aka Test Strategy).

Naturally, the focus of consumers will be on failed test cases more than passed ones. A unified failure taxonomy can contribute to align, too: Whether it is a product defect, a test tool defect (not relevant for customers), data defect, environment defect, flakiness, timing/performance degradation can make a huge difference.

It helps to agree on definitions, not thresholds first: What is a pass rate, how are process time intervals like „mean time to detect“, „mean time to repair“ defined, how are flakiness rates determined and coverages calculated.

Closing words

I have been pondering about this topic for many months, and I am glad I finally have put all my thoughts in words and a graphic:

I am looking forward to feedback. I am sure my evaluations are not globally agreeable, so I would be happy to enter a constructive discussion.

Kategorien
Allgemein

Remote target doesn’t support qGetTIBAddr packet

I had a weird issue today connecting via the VS Code extension Native Debug to a qemu instance. It was giving:

Visual Studio Code
Failed to attach: Remote target doesn't support qGetTIBAddr packet (from target-select remote :1234)

[Open 'launch.json'] [Cancel]

That sounded like gdb/native debug is expecting a feature qemu is not offering; however, just the day before it ran successfully – so what happened? Unfortunately and coincidentally, I did some housekeeping a few hours before, so my suspicion was that I somehow uninstalled some facilities, like Windows SDK or so. After 2 hours trying to reproduce my earlier setup, checking older versions of qemu, gdb and native debug I almost gave up, when I stumbled upon this via Google:
https://github.com/abaire/nxdk_pgraph_tests/blob/main/README.md?plain=1#L187

NOTE: If you see a failure due to "Remote target doesn't support qGetTIBAddr packet", check the GDB output to make sure
that the `.gdbinit` file was successfully loaded.

Now, of course I checked the gdb output before, but besides some warnings nothing appeared suspicious. The link made me re-check, and indeed there was this:

undefinedBFD: reopening /cygdrive/c/Users/Jakob/Documents/workspace/os-series/C:\Users\Jakob\Documents\workspace\os-series\dist\x86_64\kernel.bin: No such file or directory

That appeared new on second thought, and I removed following line from my VS Code’s launch.json

"executable": "dist/x86_64/kernel.bin",

That made it work again. At least partially, of course now the info to the executable is missing, but I have the feeling this is a minor thing to fix.

Addition (Nov 11th):

I have two additions to make. First it seems reader Piotr found the actual proper solution to this, see his comment below. After setting „set osabi none“ in .gdbinit I can indeed use gdb as before. This setting makes gdb not assuming any operating system ABI, which makes sense when you code your own OS from scratch. Thank you so much Piotr!

Second, just fyi and in case someone has similar issue but the aforementioned solution doesn’t work for some reasons, here is tzhe workaround I used till Piotr came for the rescue. As written above, removing the „executable“ line from the launch.json made gdb work, but of course now the executable and its debug symbols are missing, so setting breakpoints from the UI didnt work. After much tinkering I realized that adding the executable later during the debugging session helped. So what I did was adding a hardcoded breakpoint in the very beginning of my test object. When this breakpoint was hit, some commands were executed, of which adding the kernel as symbol file is the most important one. Also I had to add another breakpoint inside this command, which made gdb reload all other breakpoints from the UI, too.

b *0x100018
commands
  add-symbol-file dist/x86_64/kernel.bin
  b isr_common_stub
end

This worked with the caveat, that each debugging session halted at this first hardcoded breakpoint and I had to manually continue once. It was an acceptable workaround, but I am happy today Piotr gave a proper solution.

I still have no clue what exactly made this issue pop up; as mentioned I blame my „housekeeping“ activities, which were too many to reproduce the exact root cause.

Kategorien
Allgemein

Technology Radar #29: Automotive SW perspective

As written before, I really like the regular updates provided by Thoughtworks in their Technology Radar. My focus is on the applicability of techniques, tools, platforms and languages for automotive software, with a further focus on embedded in-car software. Hence, I am ignoring pure web-development and machine learning/data analytics stuff which usually makes a huge portion of the whole report. Recently, its volume 29 has been published. Let’s have a look!

In the techniques sector, the topic lightweight approach to RFCs has made it to the adopt area, meaning there is a strong recommendation to apply it. During my time at MBition, Berlin, I became exposed to a talk by Paul Adams on YouTube via my colleague Johan Thelin, which Paul later also gave during an all-hands event of our project:

Hence, the RFC thing very well resonates with me. It has been my style of creating documents about strategies, concepts, plans and very early requesting feedback from peers to check if the general direction is correct, and to finalize it later. Much what software engineers are used to do in Pull Requests, such scheme can and should be applied to more areas in a systematic manner. Architecture is one obvious area, but it can also be applied in many other areas. Confluence and similar collaboration platforms offer great inline commenting capabilities to discuss about any controversial aspects of a document and sort them out.

2.5 years ago I wrote about Dependency updates with Renovate. In the blip automatic merging of dependency update PRs the authors argue in favor of automatic merging of bot-generated dependency-updates. What can I say, makes total sense. Till today I have manually merged the pull requests created by the bot, but now I just let it automatically do that – of course only after a successful pipeline run. With renovate its as simple as adding "automerge": true to the renovate.json in each repo.

In tracking health over debt the authors describe a notion to focus more on the health of a sw system than tracking its (technical) debt. Its a worthwile approach, since focussing on debt means tracking an often ever-growing list. In my experience, some debt often gets obsolete, and some debt which was fiercely discussed when it was „implemented“ later is turning out significantly worse or better. Instead, tracking the health of the system as primary measure where to act at any time may yield better results in the overall long game.

In the tools sector, Ruff is recommended as a successor to the famous Python linter Flake8. Implemented in Rust, it seems to offer superior performance while still providing similar rule coverage:

A quite untypical entry (at least to my knowledge) is the mention of DevEx 360, a survey tool focussed on identifying impediments and potential improvements among the dev team.

Our platform engineering teams have used DX DevEx 360 successfully to understand developer sentiment and identify friction points to inform the platform roadmap. Unlike similar tools, with DX DevEx 360 we’ve received a response rate of 90% or above, often with detailed comments from developers on problems and ideas for improvements. We also appreciate that the tool makes results transparent to engineers in the company instead of just managers and that it enables team-by-team breakdown to enable continuous improvement for each team’s context.

https://www.thoughtworks.com/de-de/radar/tools/summary/dx-devex-360

This was it already for „my“ scope of the tech radar. This time around, the tech radar contained a looot of new entries and updates in the AI area, around GPT and LLMs. Certainly interesting, but nothing I have much experience and applications (yet).