Kategorien
Coding

JOS Updates: Filesystem, Disk Driver, 4KB pages, Automated Testing

I am happy to announce that my own operating system written from scratch JOS got some updates in recent months. Since the last update I was able to extend two essential capabilities to progress towards my big next goal (running a somewhat useful shell): Filesystem and Disk Driver. As usual, both are very rudimentary and incomplete. At the moment only read operations are possible, and even those are very limited. Still, I am relieved that this is working now. Both in combination enable me to drop the workaround that files have been stored directly in the binary of my OS.

As fileystem I went for Ext2, on older very simple filesystem which was (and is via its successors Ext3 and Ext4) in use by many if not most Linux systems. For the disk driver I am also going for on oldie: ATA. In the end, the code for both together is only approx 500 lines. I had actually procrastinated this for a while because I was awed by the expected complexity. While both caused some frustration loops in the end it clicked faster than expected.

I am even surprised that the performance is better than what I have expected from an unoptimized first working iteration. As you can see in the tracing view one exemplary file read syscall from userspace (still Doom 😉 ) is taking less than one 1 millisecond, and reading one sector (512 bytes) from the disk is taking 100 microseconds.

What actually has cost me way more time and nerves during the last months has been a refactoring without any clear purpose (and in hindsight probably could have been skipped): I enabled JOS to run with 4 KB pages. The page size is the fundamental unit used by the memory management unit of the CPU to organize memory. For historic reasons I have been using 2 MB pages. While 2 MB work fine, I guessed that this would lead too to less granularity. E.g. I was researching into IPC mechanisms, specifically via shared memory. And if each process has to share memory in 2 MB chunks the RAM quickly would be too full. So yeah, now its running with 4 KB pages size. Theoretically 2 MB still work and could be enabled as a compile time switch. However, I was too happy that 4 KB finally worked so I didnt find the mojo yet to test-and probably fix-2MB.

Speaking of testing: The automated tests I introduced in my last brief JOS update really paid off big time. I could check any change within a few seconds of automated tests and ensure nothing worsened. As in the end all above improvements didnt directly enable any new functionality, a lot improved under the hood and Doom is still running.

Kategorien
Coding Culture

Corporate SW Engineering Aphorisms

During my recent vacation I was reflecting on patterns I observe in my 15+ tenure in a corporate SW engineering environment. My friends and colleagues know me to be very interested in the meta-level of organizational dynamics, my blog is evidence for this.

Its not so easy (for me) to communicate such patterns in a thought-provoking manner. The internet and software culture offers a very rich collection of much more clever people’s takes. You probably have heard about Murphy’s Law, Conway’s Law, or Parkinson’s Law of Triviality. https://matthewreinbold.com/2020/08/03/Technology-Aphorisms has a nice collection of those. However, phrasing laws is a but too much for my humble self. Instead, I figured aphorisms are more apprioriate for my opinions. However, I have to admit that I cant perfectly differentiate between aphorism, sententia, maxim, Aperçu, and bonmot. I guess this is just me trying to be clever, too 🙂

In task forces you bring together end-to-end teams and miraculously it works. Why do you wait for the task force to form such teams?

Great organizations mature and evolve their software over many years. Others replace it every other year – and call this progress.

While the strategy grows on shiny slides, the engineers wonder who still listens to them.

In a world of infinite content, silence becomes signal

It’s not the traffic that breaks the system — it’s the architect’s fantasy of it.

In the world of junior architects, no problem is too small for an oversized solution.

Overengineering doesn’t solve problems — it staffs them.

Complexity is job security — for the team that caused it.

Not every repetition is a problem. Some abstractions are worse.

YAGNI is no excuse for architecture – but a compass for its necessity

Every new DSL saves you five keystrokes – and costs you 3 days of debugging

Kategorien
Coding Tinkering

JOS is on fire 🔥

During a long weekend my own operating system has made some progress:

– JOS is now able to read data from ext2 filesystem images
– Via a new serial interface, automated smoke tests can be executed in the Github pipeline to check some essential features are always working
– I played around with flamegraphs, which show the call stacks (including userspace and kernelspace)

check it out at https://github.com/jbreu/jos

Kategorien
Coding Culture

Technology Radar #32: Automotive SW perspective

Few days ago version #32 of Thoughtworks‘ (TW) Technology Radar has been published. As in earlier blog posts, I want to review the topics in there from the perspective of automotive embedded SW engineering. As usual, there is some bias towards cloud technology and machine learning which is out of my current professional scope (exceptions apply), however there are enough other concepts/tools/aspects every time which make me investigate and followup to my best possibilities either at work or in my private projects. In this blog post I will try to list those parts which are roughly new and relevent from an automotive software perspective.

Lets start with the Technology sector. First item TW recommends to adopt is Fuzz Testing. Indeed its a testing approach with great potential I have barely ever leveraged (I am not alone: „Fuzz testing , or simply fuzzing, is a testing technique that has been around for a long time but it is still one of the lesser-known techniques“). Worth noting: Google has an interesting project called OSS-Fuzz in which they fuzz open source projects for free, and find a lot of issues actually. Fuzz Testing is on my top 10 sw engineering practices I want to see in real project life as soon as possible.

The next interesting item „API request collection as API product artifact“ sounds a bit clunky. I interprete it as a set of sample API requests which help developers to more quickly adopt APIs inside of an ecosystem. That is definetly desirable, as examples are often more helpful in getting the hang of a specific API than its API documentation (not to mentioned that doc is still very important, too). One caveat is to establish ways to keep the examples/collection up-to-date so they dont break after a while when the API evolves.

Then comes Architecture advice process, which resonates very well with current experience: In large software projects, a common challenge is the architectural alignment processes. Traditional methods, like Architecture Review Boards, often slow things down and are linked to poor organizational outcomes. A more effective alternative may be the architectural advice process—a decentralized model where anyone can make architectural choices, as long as they consult with affected stakeholders and experts. This approach supports faster workflows without sacrificing quality, even at scale. Though it may initially seem risky, tools like Architecture Decision Records and advisory groups help keep decisions well-informed. This model is proving successful, even in tightly regulated industries. Andrew Harmel-Law has written an insightful blog post on it.

In the Tools sector, uv is getting a spotlight. uv is the hot shit right now in the Python ecosystem. While I have not used it myself, I see it gradually replacing other Python package managers. This is due to its fast execution, but also well designed features, making it easier to run kind of self-contained Python projects.

Kategorien
Tinkering

German Government Spending Data

These days you can read a lot on DOGE in the US. Without wanting to start a flamewar on the approach and controversies, I can say it sparked my interest in government data analysis. In a rush of mojo on a Sunday morning, I used Copilot to hack together a POC to display various German data sources on government spending in a web ui with some filters. For such things LLMs are a great bootstrapping tool.

Later, with the help of Copilot I managed to extend it with a fancy UI, bar chart, more filters and, most importantly, more data sources. The latter now include also domestic expenses going to governmental and non-governmental institutions. Also, the tool now is less resource hungry.


You can find some pretty funny data in there! Check it out: https://jakobbr.eu/followthemoney

Github source: https://github.com/jbreu/follow_the_money

Kategorien
Coding

Small Update on JOS

Small update on JOS
✅ use High Precision Event Timer CPU feature to get nanosecond resolution timestamps
✅ colored logging
✅ pretty logo

Kategorien
Coding Tinkering

Writing my own x86_64 Operating System

tl;dr Repo: https://github.com/jbreu/jos

During mid of last year I dug myself into the OS development rabbit hole on Youtube and in my fall vacation I had some slack allowing me to hands-on hack together my own Operating System called JOS (Jakob’s OS). This was the starting point of a pretty exciting journey. Till today, there is no particular purpose or goal other than learning more about low level basics. Nonetheless, I enjoyed this exercise so much that I can even grant it some therapeutic effect during very stressful business life 🙂 Of course, as a family father and engineering manager, there is not much time and I often found myself pondering in my mind while doing chores how to solve the next stack corruption puzzle, instead of actually coding/debugging. In this article, I will tell you a bit about my key learnings/technologies this exposed me to.

The starting point was a brief video series by David Callanan on how to write a bootloader for an OS till printing a Hello World to a console. I started to fork his repo and worked my way from there. First thing was to rewrite the whole OS code (besides the bootloader) in Rust, as this was another thing I wanted to learn. It took me a while to get used to Rust, but eventually came to a working Hello World. The next steps were minor adaptions to the console printing ecosystem. The first real big step digging into the intrinsics of x86_64 CPU instruction set was the implementation of interrupts. This took a great amount of debugging, trial and erroring and reading of many specifications and internet resources. From the latter, I want to highlight the AMD64 Architecture Programmer’s Manual, Volume 2, and Philipp Oppermann’s Writing an OS in Rust. The latter is really good for the first steps with one caveat: It is based in big parts on the author’s library, which abstracts all x86_64 basics away and is in many parts more a tutorial to use the library. I didnt want to use this library because my impression was that it masked too much of the interesting stuff how things actually work on the low level. Hence, I implement everything on my own terms. After adopting keyboard and time interrupts I used the latter to display the clock time on the console in the top right corner.

A huge help was that I could use the qemu emulator and the gdb debugger in combination (via VS Code’s extension native-debug). It was the first time using qemu and gdb as a developer and after getting beyond the initial learning curve I genuinely enjoyed debugging with it. As additional tools I also used Ghidra (disassembler) and ElfViewer (inspect executables).

A really huge next step for me was to introduce userspace programs and later on multiprocessing (running multiple userspace programs „in parallel“ with a simple round robin scheduler). After studying basic patterns for this, I chose the hard path to implement it mostly by myself. This is the part which I am most proud of so far, because this required me to derive many inner works from the specs directly and it has cost me a long time to get right. For weeks and months I fought with sporadic stack corruptions and CPU exceptions. As userspace program I implemented a simple Hello World program, which interacted with the kernel via syscalls. Due to lack of a file system, which to date I was for some reasons not eager to implement, this userspace progam is stored inside the os executable and loaded from there – a hack which is actually really cool. Then I added a vga mode which enabled the userspace programs to print colored pixels to the screen (one line for each of the 4 userspace programs):

OK so here we were, but what you gonna do with an operating system which has keyboard input, can run user programs, has vga output? Yes, you are correct – we run Doom on it 🙂

You probably heard of people getting to run Doom on all sorts of awkward devices, engineers made it to run on potato batteries and toothbrushes:

So if they can run it on such devices, there must be a way to run it on JOS, right? And yes, its possible. It required me to write a small C wrapper around PureDOOM, which in turn also made me translate my Rust-based libc to C. After adding some additional syscalls, fighting with the 6 bit color maps, malloc implementation I finally made it. So here we stand today, JOS runs DOOM.

Kategorien
Allgemein

Remote target doesn’t support qGetTIBAddr packet

I had a weird issue today connecting via the VS Code extension Native Debug to a qemu instance. It was giving:

Visual Studio Code
Failed to attach: Remote target doesn't support qGetTIBAddr packet (from target-select remote :1234)

[Open 'launch.json'] [Cancel]

That sounded like gdb/native debug is expecting a feature qemu is not offering; however, just the day before it ran successfully – so what happened? Unfortunately and coincidentally, I did some housekeeping a few hours before, so my suspicion was that I somehow uninstalled some facilities, like Windows SDK or so. After 2 hours trying to reproduce my earlier setup, checking older versions of qemu, gdb and native debug I almost gave up, when I stumbled upon this via Google:
https://github.com/abaire/nxdk_pgraph_tests/blob/main/README.md?plain=1#L187

NOTE: If you see a failure due to "Remote target doesn't support qGetTIBAddr packet", check the GDB output to make sure
that the `.gdbinit` file was successfully loaded.

Now, of course I checked the gdb output before, but besides some warnings nothing appeared suspicious. The link made me re-check, and indeed there was this:

undefinedBFD: reopening /cygdrive/c/Users/Jakob/Documents/workspace/os-series/C:\Users\Jakob\Documents\workspace\os-series\dist\x86_64\kernel.bin: No such file or directory

That appeared new on second thought, and I removed following line from my VS Code’s launch.json

"executable": "dist/x86_64/kernel.bin",

That made it work again. At least partially, of course now the info to the executable is missing, but I have the feeling this is a minor thing to fix.

Addition (Nov 11th):

I have two additions to make. First it seems reader Piotr found the actual proper solution to this, see his comment below. After setting „set osabi none“ in .gdbinit I can indeed use gdb as before. This setting makes gdb not assuming any operating system ABI, which makes sense when you code your own OS from scratch. Thank you so much Piotr!

Second, just fyi and in case someone has similar issue but the aforementioned solution doesn’t work for some reasons, here is tzhe workaround I used till Piotr came for the rescue. As written above, removing the „executable“ line from the launch.json made gdb work, but of course now the executable and its debug symbols are missing, so setting breakpoints from the UI didnt work. After much tinkering I realized that adding the executable later during the debugging session helped. So what I did was adding a hardcoded breakpoint in the very beginning of my test object. When this breakpoint was hit, some commands were executed, of which adding the kernel as symbol file is the most important one. Also I had to add another breakpoint inside this command, which made gdb reload all other breakpoints from the UI, too.

b *0x100018
commands
  add-symbol-file dist/x86_64/kernel.bin
  b isr_common_stub
end

This worked with the caveat, that each debugging session halted at this first hardcoded breakpoint and I had to manually continue once. It was an acceptable workaround, but I am happy today Piotr gave a proper solution.

I still have no clue what exactly made this issue pop up; as mentioned I blame my „housekeeping“ activities, which were too many to reproduce the exact root cause.

Kategorien
Book Culture

Thoughts on “Implementing Lean Software Development”

Reading and summarizing books on lean software development, so you dont have to. Part 3 (see Part 1 and Part 2).

“Implementing Lean Software Development” written by Mary and Tom Poppendieck and published 2007 at Addison-Wesley. The Poppendiecks are quite famous in the lean-agile software development community, as they published the constitutive book „Lean Software Development: An Agile Toolkit“ in 2003, the first (recognized) book about bringing the lean principles to the software development space. The book reviewed here is a successor book aimed at delivering more practical advice. As in the last parts, my review will not focus on re-iterating lean and agile fundamentals, but rather focus on novelty aspects, ideas, and noteworthy pieces.

In the foreword, Jeff Sutherland (co-founder of the Scrum framework) introduces the Japanese terms of Muri (properly loading a system), Mura (never stressing a person, system or process) and Muda (waste):

Yet many managers want to load developers at 110 percent. They desperately want to create a greater sense of “urgency” so developers will “work harder.” They want to micromanage teams, which stifles
self-organization. These ill-conceived notions often introduce wait time, churn, death marches, burnout, and failed projects.
When I ask technical managers whether they load the CPU on their laptop to 110 percent they laugh and say, “Of course not. My computer would stop running!” Yet by overloading teams, projects are often late, software is brittle and hard to maintain, and things gradually get worse, not better.

page xix

In their historical review the authors bring a very interesting statistics which should resonate with many of my peers:

Both Toyodas had brilliantly perceived that the game to be played was
not economies of scale, but conquering complexity. Economies of scale will reduce costs about 15 percent to 25 percent per unit when volume doubles. But costs go up by 20 percent to 35 percent every time variety doubles. Just-in-Time flow drives out major contributors to the cost of variety. In fact, it is the only industrial model we have that effectively manages complexity.

page 5

As evidence, two papers are given: „Time -The Next Source of Competitive Advantage“ by George Stalk and „Lean or Sigma?“ by Freddy and Michael Balle. Managers and engineers increasingly become aware about the not-so-visible cost of complexity, typically by experiencing project failure or long-term product degradation.

For the aspect of inventory, the authors provide a quite good methaphor:

Inventory is the water level in a stream, and when the water level is high, a lot of big rocks lurking under the water are hidden. If you lower the water level, the big rocks begin to surface. At that point, you have to clear the rock out of the way, or your boat will crash into them. As the big rocks are removed, you can lower inventory level some more, find more rocks, clear them out of the stream, and keep on going until there are just pebbles left.

page 8

That adoption of lean practices and mindset is not straightforward and many organizations struggle or fail to do so is explained by the authors by pointing at a „cherrypicking“ approach. Hence, only some activities of the lean domain are adopted in isolation, like just-in-time or stop-the-line. Instead, they a classic:

The truly lean plant […] transfers the maximum number of tasks and responsibilities to those workers actually adding value to the car on the line, and it has in place a system for detecting defects that quickly
traces every problem, once discovered, to its ultimate source.

Womack, Jones, Roos: The machine that changed the world, page 99

I think this cannot be underestimated. To seldom I have seen organizations and management really focussing on the „value creators“ and the impediments those are facing.

In earlier blog posts I already wrote about the differences and similarities in the lean manufacturing and lean development. The Poppendiecks provide a table putting both side-by-side (page 14):

Later, in a footnote, the authors refer to a paper by Kajko-Mattsson et al. on the cost of software maintenance. The paper’s sources vary a lot, however its obvious that considering a typical big software project it becomes clear that this ratio quickly translates to millions of Euro/Dollar.

The published numbers point out that maintenance costs between 40% to 90% […]. There are very few publications reporting on the cost of each individual maintenance category. The reported ones are the
following: (1) corrective maintenance – 16-22% […] (2) perfective maintenance – 55% […], and (3) adaptive maintenance – 25% […].

Kajko-Mattsson et al: Taxonomy of problem management activities, page 1

On the lean principle of waste, the Poppendiecks make a simple but revelating statement:

To eliminate waste, you first have to recognize it. Since waste is anything
that does not add value, the first step to eliminating waste is to develop a keen sense of what value really is. There is no substitute for developing a deep understanding of what customers will actually value once they start using the software. In our industry, value has a habit of changing because, quite often, customers don’t really know what they want. In addition, once they see new software in action, their idea of what they want will invariably shift. Nevertheless, great software development organizations develop a deep sense of customer value and continually delight their customers.

page 23

Too often have I experienced software development projects who dont know what their product and the value they provide actually is. Of course, everyone has a vague feeling about what it could be, but putting it in clear words is seldom attempted and easily ends in conflict (a conflict which can be constructive if facilitated well).

On the second principle „Build Quality In“, there are some interesting distinctions on defects and the relation to „inspection“:

According to Shigeo Shingo, there are two kinds of inspection: inspection after defects occur and inspection to prevent defects.10 If you really want quality, you don’t inspect after the fact, you control conditions so as not to allow defects in the first place. If this is not possible, then you inspect the product after each small step, so that defects are caught immediately after they occur. When a defect is found, you stop-the-line, find its cause, and fix it immediately.
Defect tracking systems are queues of partially done work, queues of rework if you will. Too often we think that just because a defect is in a queue, it’s OK, we won’t lose track of it. But in the lean paradigm, queues are collection points for waste. The goal is to have no defects in the queue, in fact, the ultimate goal is to eliminate the defect tracking queue altogether. If you find this impossible to imagine, consider Nancy Van Schooenderwoert’s experience on a three-year project that developed complex and often-changing embedded software. Over
the three-year period there were a total of 51 defects after unit testing with a maximum of two defects open at once. Who needs a defect tracking system for two defects?

page 27

The authors are citing two papers by Nancy Van Schooenderwoert („Taming the Embedded Tiger – Agile Test Techniques for Embedded
Software
“ and „Embedded Agile Project by the Numbers With Newbies„). This resonates well with me, because accumulating too many defect (tickets) is very expensive waste. Its a kind of inventory with the worst properties. To break out of this is not straightforward, I have attempted and failed multiple times to establish a „zero defect policy“ (i.e. as long as there is a defect no further feature development happens). In that context let me at two more quotes from the book:

The job of tests, and the people that develop and runs tests, is to prevent defects, not to find them.

page 28

“Do it right the first time,” has been interpreted to mean that once code is written, it should never have to be changed. This interpretation encourages developers to use some of the worst known practices for the design and development of complex systems. It is a dangerous myth to think that software should not have to be changed once it is written.

page 29

On the fifth principle of „Deliver Fast“ a very important statement is made:

Caution: Don’t equate high speed with hacking. They are worlds apart. A fast-moving development team must have excellent reflexes and a disciplined, stop-the-line culture. The reason for this is clear: You can’t sustain high speed unless you build quality in.

page 35

Very often I observe a dire need for speed. Of course everyone wants to be faster in the software industry. Competition doesnt sleep. However, similar to unclear definitions of value and products, I have barely ever seen a clear definition of speed in a software project. Or, probably more correct: there were competing definitions of speed on people’s and especially decision maker’s minds. Its a huge difference to beat your team to „push out features now“ and grind to a halt when quality activities are started, or to maintain a sustainable pace:

When you measure cycle time, you should not measure the shortest time through the system. It is a bad idea to measure how good you are at expediting, because in a lean environment, expediting should be neither necessary nor acceptable. The question is not how fast can you deliver, but how fast do you repeatedly and reliably deliver a new capability or respond to a customer request.

page 238

The Poppendiecks are summarizing those effects in two vicious cycles (page 38):

For all the lean principles, the Poppendiecks also discuss myths originating from mis-interpreting the principles or applying them wrongly. One which caught my attention was the myth „Optimize by decomposition“. Its about the proliferation of metrics once an organization starts to apply the benefits of visual management. All of a sudden, there are tens if not hundreds of dashboards, graphs, KPIs, and such flying around. Their recommendation:

When a measurement system has too many measurements the real goal of the effort gets lost among too many surrogates, and there is no guidance for making tradeoffs among them. The solution is to “Measure UP” that is, raise the measurement one level and decrease the number of measurements. Find a higher-level measurement that will drive the right results for the lower level metrics and establish a basis for making trade-offs.

page 40

Speaking about myths, they encourage readers to check which myths apply to their situation – certainly a worthwile exercise also for you 🙂

Early specification reduces waste
The job of testing is to find defects
Predictions create predictability
Planning is commitment
Haste makes waste
There is one best way
Optimize by decomposition

page 42

Coming back to the notion of value, the authors are asking the fundamental question how great products are conceived and developed. They write:

In 1991, Clark and Fujimoto’s book Product Development Performance presented strong evidence that great products are the result of excellent, detailed information flow. The customers‘ perception of the product is determined by the quality of the flow of information between the marketplace and the development team. The technical integrity of the product is determined by the quality of the information flow among upstream and downstream technical team members. There are two steps you can take to facilitate this information flow: 1) provide leadership, and 2) empower a complete team.

page 52

The book has an extensive chapter on waste with many insightful aspects. I dont want to repeat all of them, and instead just provide some examples. For example I found this statement on the relationship of automation and waste/complexity very inspiring.

We are not helping our customers if we simply automate a complex or messy process; we would simply be encasing a process filled with waste in a straight jacket of software complexity. Any process that is a candidate for automation should first be clarified and simplified, possibly even removing existing automation. Only then can the process be clearly understood and the leverage points for effective automation identified.

page 72

In my current position, automation is a key activity, and we try to automate everything in an endeavour to increase speed, quality and convenience. The quote points out, that automation can hide or defer complexity. I can confirm this. Even though my team automated the complexity of product variants in the build process, our customers (e.g. manual testers) dont have a chance to test all the build we produce. Hence, even made with best intentions, our automation is overloading the whole.

Another good comparison between traditional manufacturing and software development is the following table, putting the seven waste equivalents side-by-side (page 74):

On architectural foresight, I like the following statement:

Creating an architectural capability to add features later rather than sooner is good. Extracting a reusable services „framework“ for the enterprise has often proven to be a good idea. Creating a speculative application framework that can be configured to do just about anything has a track record of failure. Understand the difference.

page 76

While discussing Value Streams, the authors dig into effectiveness and efficiency. They are of the opinion that

chasing the phantom of full utilization creates long queues that take far more effort to maintain than they are worth-and actually decreases effective utilization.

page 88

This opinion is not speculation, they provide a good analogy to road traffic and computer utilization:

High utilization is another thing that makes systems unstable. This is obvious to anyone who has ever been caught in a traffic jam. Once the utilization of the road goes above about 80 percent, the speed of the traffic starts to slow down. Add a few more cars and pretty soon you are moving at a crawl. When operations managers see their servers running at 80 percent capacity at peak times, they know that response time is beginning to suffer, and they quickly get more servers. […]

Most operations managers would get fired for trying to get maximum utilization out of each server, because it’s common knowledge that high utilization slows servers to a crawl. Why is it that when development managers see a report saying that 90 percent of their available hours were used last month, their reaction is, „Oh look! We have time for another project!“

pages 101f

I think in daily work, management typically does not pay enough attention to those basics. It is not that this is not known that too high utilization of resouces is bad, quite the opposite is the case in my experience. However, the root causes and the remedies are often not considered. Instead, there is a sentiment of capitulation: „Yes I know our team is stressed and overloaded, but we have to get faster nevertheless.“

In order to reduce cycle times, the authors refer to queuing theory, which provides several approaches:

Even out the arrival of work

Minimize the number of things in process

Minimize the size of things in process

Establish a regular cadence

Limit work to capacity

Use pull scheduling

page 103

In the chapter „People“, there is a lot of reference to William Edwards Deming, a pioneer of quality management. Its an iron of history, that this American actually was teaching the fundamentals of what leater became Lean in post-war Japan, while he was „discovered“ only in the 1980s by the US (industrial) public. Deming formulated a what he called „System of Profound Knowledge“:

  1. Appreciation of a System: A business is a system. Action in one part of the system will have effects in the other parts. We often call these “unintended consequences.” By learning about systems we can better avoid these unintended consequences and optimize the whole system.
  2. Knowledge of Variation: One goal of quality is to reduce variation. Managers who do not understand variation frequently increase variation by their actions. Critical to this is understanding the two types of variation — Common cause which is variation from the system and Special cause which variation from outside the system
  3. Theory of Knowledge: There is no knowledge without theory. Understanding the difference between theory and experience prevents shallow change. Theory requires prediction, not just explanation. While you can never prove that a theory is right, there must exist the possibility of proving it wrong by testing its predictions.
  4. Understanding of Psychology: To understand the interaction between work systems and people, leaders must seek to answer questions such as: How do people learn? How do people relate to change? What motivates people?
https://medium.com/10x-curiosity/system-of-profound-knowledge-ce8cd368ca62

When pursuing change and transformation, it is very important to take the staff on board. This is easier said than done, because the employees have a very fine sense. They realize very quickly, if for example a certain change in mindset is requested from them, but not exercised by their supervisors. In engineering projects, the demands and expectations of decision makers are often antagonistic to their communicated strategies and visions. Just consider if in your organization „quality“ is an essential part of your long-term goals, and totally overriden by daily task force death marches.

The challenge to achieve quality is handled in another dedicated chapter. The authors point out the importance of „superb, detailed discipline“ to achieve high quality. Here come the famous „5 S’s“ into play. The book’s authors transfer them also to the software space:

Sort (Seiri): Sort through the stuff on the team workstations and servers, and find the old versions of software and old files and reports that will never be used any more. Back them up if you must, then delete them.

Systematize (Seiton): Desktop layouts and file structures are important. They should be crafted so that things are logically organized and easy to find. Any workspace that is used by more than one person should conform to a common team layout so people can find what they need every place they log in.

Shine (Seiso): Whew, that was a lot of work. Time to throw out the pop cans and coffee cups, clean the fingerprints off the monitor screens, and pick up all that paper. Clean up the whiteboards after taking pictures of the important designs that are sketched there.

Standardize (Seiketsu): Put some automation and standards in place to make sure that every workstation always has the latest version of the tools, backups occur regularly, and miscellaneous junk doesn’t accumulate.

Sustain (Shitsuke): Now you just have to keep up the discipline.

page 191

I really enjoyed reading this book and can absolutely recommend reading it. It contains a lot of gems, and is probably one of those book you want to read every other year again to re-discover aspects and connect them to new experience.

Kategorien
Allgemein

Technology Radar #29: Automotive SW perspective

As written before, I really like the regular updates provided by Thoughtworks in their Technology Radar. My focus is on the applicability of techniques, tools, platforms and languages for automotive software, with a further focus on embedded in-car software. Hence, I am ignoring pure web-development and machine learning/data analytics stuff which usually makes a huge portion of the whole report. Recently, its volume 29 has been published. Let’s have a look!

In the techniques sector, the topic lightweight approach to RFCs has made it to the adopt area, meaning there is a strong recommendation to apply it. During my time at MBition, Berlin, I became exposed to a talk by Paul Adams on YouTube via my colleague Johan Thelin, which Paul later also gave during an all-hands event of our project:

Hence, the RFC thing very well resonates with me. It has been my style of creating documents about strategies, concepts, plans and very early requesting feedback from peers to check if the general direction is correct, and to finalize it later. Much what software engineers are used to do in Pull Requests, such scheme can and should be applied to more areas in a systematic manner. Architecture is one obvious area, but it can also be applied in many other areas. Confluence and similar collaboration platforms offer great inline commenting capabilities to discuss about any controversial aspects of a document and sort them out.

2.5 years ago I wrote about Dependency updates with Renovate. In the blip automatic merging of dependency update PRs the authors argue in favor of automatic merging of bot-generated dependency-updates. What can I say, makes total sense. Till today I have manually merged the pull requests created by the bot, but now I just let it automatically do that – of course only after a successful pipeline run. With renovate its as simple as adding "automerge": true to the renovate.json in each repo.

In tracking health over debt the authors describe a notion to focus more on the health of a sw system than tracking its (technical) debt. Its a worthwile approach, since focussing on debt means tracking an often ever-growing list. In my experience, some debt often gets obsolete, and some debt which was fiercely discussed when it was „implemented“ later is turning out significantly worse or better. Instead, tracking the health of the system as primary measure where to act at any time may yield better results in the overall long game.

In the tools sector, Ruff is recommended as a successor to the famous Python linter Flake8. Implemented in Rust, it seems to offer superior performance while still providing similar rule coverage:

A quite untypical entry (at least to my knowledge) is the mention of DevEx 360, a survey tool focussed on identifying impediments and potential improvements among the dev team.

Our platform engineering teams have used DX DevEx 360 successfully to understand developer sentiment and identify friction points to inform the platform roadmap. Unlike similar tools, with DX DevEx 360 we’ve received a response rate of 90% or above, often with detailed comments from developers on problems and ideas for improvements. We also appreciate that the tool makes results transparent to engineers in the company instead of just managers and that it enables team-by-team breakdown to enable continuous improvement for each team’s context.

https://www.thoughtworks.com/de-de/radar/tools/summary/dx-devex-360

This was it already for „my“ scope of the tech radar. This time around, the tech radar contained a looot of new entries and updates in the AI area, around GPT and LLMs. Certainly interesting, but nothing I have much experience and applications (yet).