Kategorien
Coding

Experiences with agentic coding

[No tokens have been burned to write this article 🙂 ]

Even though I am subject to some skepticism on the rise of LLM/GenAI, its impact on software engineering practices (must read: very lucid article on the impact of AI code generation on the whole software engineering value chain, from a lean perspective) and the whole of society, I am trying to stay on top of whatever new possibilities arise and not only consume huge amounts of YouTube videos, instead also try to apply them on my projects in work and private. I do not consider myself a power user, I am not burning through Claude Pro Max 20x plans, and I am mostly sticking to my VS Code, which seems to follow the more progressive agentic IDEs in a reasonable pace.

A very nice effect is that LLMs enable myself as a busy software engineering manager to still come back to the occasional helpful helper script every once in a while. I have been doing this without LLMs, but my typical work-related one-shot single purpose script output has increased over the last 2 years or so.

Learning 1: Context: In my bigger projects, which existed before GenAI got a thing, the first thing I am doing is to let it generate a technical and/or product documentation. I put a focus on software quality aspects which are important to me, like a very high test coverage. I am confident that this helps me to keep the codebase somewhat under control even if code output by LLM increases massively.

[Excerpt]

### Docker

Multi-stage build:
1. **Builder** (composer:2.9.5): installs PHP dependencies (`--no-dev`)
2. **Runtime** (php:8.5.0-apache): copies app + vendor, generates `swagger/swagger.json` via `openapi` CLI, enables `mod_rewrite`, installs `pdo_mysql` extension, uses production `php.ini`

### Acceptance Tests

Browser-based tests using PHPUnit + Selenium WebDriver (Chrome headless).

### Static Analysis & Quality

Phan, PHPMD, PHP_CodeSniffer, Infection (mutation testing), Trivy (CVE scanning), Schemathesis (API contract testing).

## Noteworthy Details

- **No authentication**: All endpoints are public. Security relies on random hex IDs being unguessable and infrastructure-level controls
- **No framework**: Raw PHP with manual URI parsing and method dispatch
- **SQL injection protection**: All queries use PDO prepared statements
- **OpenAPI auto-generation**: Swagger spec generated at Docker build time from PHP attributes
- **Sentinel dates**: Deadline of `2000-01-01` means "no deadline" — frontend filters this out
- **Date handling**: ISO 8601 on API boundary, MySQL datetime in storage, conversion in Gateway layer

Putting this documentation (and policies) in a defined place will always include them in the prompt (instructions.md under .github/instructions). With Claude Skills and the like there is a neven more versatile option to carry this out. I recently started to employ this in work projects.

Learning 1.5: Do not forget to review and push your agent to obey your project policies. It can easily happen that for a small additional feature, an agent may attempt to replace your complete architecture and framework. This has gotten better lately, but still may happen very unexpectedly.

Learning 2: Planning before Coding. Like with human coding, thinking about the given task in depth and first making a solid plan also helps to produce better implementations from the start. This refinement can

Learning 3: Parallelize: In the screenshot below you can see 3 sessions running in parallel. This takes quite a mental load. I am yet undecided if I really can handle more than one thread without causing too much turmoil in my project.

Learning 4: Read the responses: At least skip over them. Sometimes it does something what you want, but leaves some things open. Just while writing this article I almost missed a „If you want, I can also add a small backend test or update the task creation API flow to fully validate the new field.“ Yes, please!

Learning 5: Tool calling: I have come to pretty much trust any tool calls in the context of my projects from my IDE. At work, we use VS Code dev containers, which seem to be a reasonable protection. Of course, I have read the horror stories around OpenClaw, but I am not yet commiting my full life’s data and resources, and having a proper version control makes me belief its ok for now.

Learning 6: GenAI is considerably better than me in design. For my (still unpublished) private pet project of the last 7 years or so, it did one shot a massively better UI design. Well actually, it even proposed multiple designs though which I could cycle via a toggle button. For sure, professional UI/UX designers will have a word on this, but their skills are not my baseline.

Kategorien
Coding Tinkering

Migrating a shell to JOS

When I published my work around my own operating system JOS many moths back, my former colleague and friend Alexander suggest that in order to make my OS useful, it requires bash. Little did he (or me) know, but this lead to a one year long endeavour to get a shell working.

One bug enabler was the implementation of filesystem and disk drivers, I wrote already about it. When that was ready I started to get the source code of the bash (Bourne Again Shell), the most widely used shell. However, it turned out that porting it to another OS is pretty complex. So halfway through, I switched to dash (Debian Almquist Shell), reading somewhere on this internet that its portability is better. Less known by its name, its the default shell in Ubuntu. So I reckoned that going for it will give me a sufficiently powerful shell for long-term use.

So, how do you get a shell running on a written-from-scratch operating system? Like any other userspace program, you need to get its dependencies straight – which is you get all its mandatory standard library headers and implementations in, including additional system calls the kernel did not yet implement.

So first step was to compile (not yet link) all the dash source code against my own libc headers. This already took me long time, to get all the function declarations in place for the compile to go through. The result was pretty scary, sooo many functions with complex POSIX functionality I would have to implement! A lot of stuff I didn’t even hear about before, yet understand its inner workings.

When I went on to the linking phase, I had to provide those implementations. As I saw no way to get all of them right and implemented them in a fully correct manner, I chose to basically implement them all first in a no-op version, returning error by default:

This allowed my dash to actually compile and link and procude a dash executable. Now I could get to the juicy stuff. Of course, I couldn’t just expect all the no-op failing functions lead to a running shell. Some functions were really required to function. This phase required a lot of trial-and-error. Having reached the first $ symbol (the marker for user input), you can see that till now, many functions are not implemented, but obviously are not required for minimal functionality.

Some libc functionality could be covered on userspace side, with some workarounds. E.g. many printf variants like vsnprintf can be implemented with sprintf under the hood. Others required some additional kernel functionality. You can see in the next screenshot the syscalls I had to add (actually kill isnt used).

So finally, around christmas I was able to actually run some first shell commands, like echo or simple loops, and dash would evaluate:

Compared to running Doom, this output seems pretty underwhelming, but for me its a great milestone.

The above comes without saying, that this shell is not yet very useful. While basics are working, it is not yet capable of launching other userspace processes. I started to implement vfork and execve for this purpose and while I had some good traction in the beginning, it turned out that my kernel’s whole memory management and paging logic is just too convoluted and brittle. Hence, I set my next goal to refactor this code in order. Let’s see if and when that happens.

In case you are interested, you can find the code here: https://github.com/jbreu/jos/pull/34

Kategorien
Coding

JOS Updates: Filesystem, Disk Driver, 4KB pages, Automated Testing

I am happy to announce that my own operating system written from scratch JOS got some updates in recent months. Since the last update I was able to extend two essential capabilities to progress towards my big next goal (running a somewhat useful shell): Filesystem and Disk Driver. As usual, both are very rudimentary and incomplete. At the moment only read operations are possible, and even those are very limited. Still, I am relieved that this is working now. Both in combination enable me to drop the workaround that files have been stored directly in the binary of my OS.

As fileystem I went for Ext2, on older very simple filesystem which was (and is via its successors Ext3 and Ext4) in use by many if not most Linux systems. For the disk driver I am also going for on oldie: ATA. In the end, the code for both together is only approx 500 lines. I had actually procrastinated this for a while because I was awed by the expected complexity. While both caused some frustration loops in the end it clicked faster than expected.

I am even surprised that the performance is better than what I have expected from an unoptimized first working iteration. As you can see in the tracing view one exemplary file read syscall from userspace (still Doom 😉 ) is taking less than one 1 millisecond, and reading one sector (512 bytes) from the disk is taking 100 microseconds.

What actually has cost me way more time and nerves during the last months has been a refactoring without any clear purpose (and in hindsight probably could have been skipped): I enabled JOS to run with 4 KB pages. The page size is the fundamental unit used by the memory management unit of the CPU to organize memory. For historic reasons I have been using 2 MB pages. While 2 MB work fine, I guessed that this would lead too to less granularity. E.g. I was researching into IPC mechanisms, specifically via shared memory. And if each process has to share memory in 2 MB chunks the RAM quickly would be too full. So yeah, now its running with 4 KB pages size. Theoretically 2 MB still work and could be enabled as a compile time switch. However, I was too happy that 4 KB finally worked so I didnt find the mojo yet to test-and probably fix-2MB.

Speaking of testing: The automated tests I introduced in my last brief JOS update really paid off big time. I could check any change within a few seconds of automated tests and ensure nothing worsened. As in the end all above improvements didnt directly enable any new functionality, a lot improved under the hood and Doom is still running.

Kategorien
Coding Culture

Corporate SW Engineering Aphorisms

During my recent vacation I was reflecting on patterns I observe in my 15+ tenure in a corporate SW engineering environment. My friends and colleagues know me to be very interested in the meta-level of organizational dynamics, my blog is evidence for this.

Its not so easy (for me) to communicate such patterns in a thought-provoking manner. The internet and software culture offers a very rich collection of much more clever people’s takes. You probably have heard about Murphy’s Law, Conway’s Law, or Parkinson’s Law of Triviality. https://matthewreinbold.com/2020/08/03/Technology-Aphorisms has a nice collection of those. However, phrasing laws is a but too much for my humble self. Instead, I figured aphorisms are more apprioriate for my opinions. However, I have to admit that I cant perfectly differentiate between aphorism, sententia, maxim, Aperçu, and bonmot. I guess this is just me trying to be clever, too 🙂

In task forces you bring together end-to-end teams and miraculously it works. Why do you wait for the task force to form such teams?

Great organizations mature and evolve their software over many years. Others replace it every other year – and call this progress.

While the strategy grows on shiny slides, the engineers wonder who still listens to them.

In a world of infinite content, silence becomes signal

It’s not the traffic that breaks the system — it’s the architect’s fantasy of it.

In the world of junior architects, no problem is too small for an oversized solution.

Overengineering doesn’t solve problems — it staffs them.

Complexity is job security — for the team that caused it.

Not every repetition is a problem. Some abstractions are worse.

YAGNI is no excuse for architecture – but a compass for its necessity

Every new DSL saves you five keystrokes – and costs you 3 days of debugging

Kategorien
Coding Tinkering

JOS is on fire 🔥

During a long weekend my own operating system has made some progress:

– JOS is now able to read data from ext2 filesystem images
– Via a new serial interface, automated smoke tests can be executed in the Github pipeline to check some essential features are always working
– I played around with flamegraphs, which show the call stacks (including userspace and kernelspace)

check it out at https://github.com/jbreu/jos

Kategorien
Coding Culture

Technology Radar #32: Automotive SW perspective

Few days ago version #32 of Thoughtworks‘ (TW) Technology Radar has been published. As in earlier blog posts, I want to review the topics in there from the perspective of automotive embedded SW engineering. As usual, there is some bias towards cloud technology and machine learning which is out of my current professional scope (exceptions apply), however there are enough other concepts/tools/aspects every time which make me investigate and followup to my best possibilities either at work or in my private projects. In this blog post I will try to list those parts which are roughly new and relevent from an automotive software perspective.

Lets start with the Technology sector. First item TW recommends to adopt is Fuzz Testing. Indeed its a testing approach with great potential I have barely ever leveraged (I am not alone: „Fuzz testing , or simply fuzzing, is a testing technique that has been around for a long time but it is still one of the lesser-known techniques“). Worth noting: Google has an interesting project called OSS-Fuzz in which they fuzz open source projects for free, and find a lot of issues actually. Fuzz Testing is on my top 10 sw engineering practices I want to see in real project life as soon as possible.

The next interesting item „API request collection as API product artifact“ sounds a bit clunky. I interprete it as a set of sample API requests which help developers to more quickly adopt APIs inside of an ecosystem. That is definetly desirable, as examples are often more helpful in getting the hang of a specific API than its API documentation (not to mentioned that doc is still very important, too). One caveat is to establish ways to keep the examples/collection up-to-date so they dont break after a while when the API evolves.

Then comes Architecture advice process, which resonates very well with current experience: In large software projects, a common challenge is the architectural alignment processes. Traditional methods, like Architecture Review Boards, often slow things down and are linked to poor organizational outcomes. A more effective alternative may be the architectural advice process—a decentralized model where anyone can make architectural choices, as long as they consult with affected stakeholders and experts. This approach supports faster workflows without sacrificing quality, even at scale. Though it may initially seem risky, tools like Architecture Decision Records and advisory groups help keep decisions well-informed. This model is proving successful, even in tightly regulated industries. Andrew Harmel-Law has written an insightful blog post on it.

In the Tools sector, uv is getting a spotlight. uv is the hot shit right now in the Python ecosystem. While I have not used it myself, I see it gradually replacing other Python package managers. This is due to its fast execution, but also well designed features, making it easier to run kind of self-contained Python projects.

Kategorien
Coding

Small Update on JOS

Small update on JOS
✅ use High Precision Event Timer CPU feature to get nanosecond resolution timestamps
✅ colored logging
✅ pretty logo

Kategorien
Coding Tinkering

Writing my own x86_64 Operating System

tl;dr Repo: https://github.com/jbreu/jos

During mid of last year I dug myself into the OS development rabbit hole on Youtube and in my fall vacation I had some slack allowing me to hands-on hack together my own Operating System called JOS (Jakob’s OS). This was the starting point of a pretty exciting journey. Till today, there is no particular purpose or goal other than learning more about low level basics. Nonetheless, I enjoyed this exercise so much that I can even grant it some therapeutic effect during very stressful business life 🙂 Of course, as a family father and engineering manager, there is not much time and I often found myself pondering in my mind while doing chores how to solve the next stack corruption puzzle, instead of actually coding/debugging. In this article, I will tell you a bit about my key learnings/technologies this exposed me to.

The starting point was a brief video series by David Callanan on how to write a bootloader for an OS till printing a Hello World to a console. I started to fork his repo and worked my way from there. First thing was to rewrite the whole OS code (besides the bootloader) in Rust, as this was another thing I wanted to learn. It took me a while to get used to Rust, but eventually came to a working Hello World. The next steps were minor adaptions to the console printing ecosystem. The first real big step digging into the intrinsics of x86_64 CPU instruction set was the implementation of interrupts. This took a great amount of debugging, trial and erroring and reading of many specifications and internet resources. From the latter, I want to highlight the AMD64 Architecture Programmer’s Manual, Volume 2, and Philipp Oppermann’s Writing an OS in Rust. The latter is really good for the first steps with one caveat: It is based in big parts on the author’s library, which abstracts all x86_64 basics away and is in many parts more a tutorial to use the library. I didnt want to use this library because my impression was that it masked too much of the interesting stuff how things actually work on the low level. Hence, I implement everything on my own terms. After adopting keyboard and time interrupts I used the latter to display the clock time on the console in the top right corner.

A huge help was that I could use the qemu emulator and the gdb debugger in combination (via VS Code’s extension native-debug). It was the first time using qemu and gdb as a developer and after getting beyond the initial learning curve I genuinely enjoyed debugging with it. As additional tools I also used Ghidra (disassembler) and ElfViewer (inspect executables).

A really huge next step for me was to introduce userspace programs and later on multiprocessing (running multiple userspace programs „in parallel“ with a simple round robin scheduler). After studying basic patterns for this, I chose the hard path to implement it mostly by myself. This is the part which I am most proud of so far, because this required me to derive many inner works from the specs directly and it has cost me a long time to get right. For weeks and months I fought with sporadic stack corruptions and CPU exceptions. As userspace program I implemented a simple Hello World program, which interacted with the kernel via syscalls. Due to lack of a file system, which to date I was for some reasons not eager to implement, this userspace progam is stored inside the os executable and loaded from there – a hack which is actually really cool. Then I added a vga mode which enabled the userspace programs to print colored pixels to the screen (one line for each of the 4 userspace programs):

OK so here we were, but what you gonna do with an operating system which has keyboard input, can run user programs, has vga output? Yes, you are correct – we run Doom on it 🙂

You probably heard of people getting to run Doom on all sorts of awkward devices, engineers made it to run on potato batteries and toothbrushes:

So if they can run it on such devices, there must be a way to run it on JOS, right? And yes, its possible. It required me to write a small C wrapper around PureDOOM, which in turn also made me translate my Rust-based libc to C. After adding some additional syscalls, fighting with the 6 bit color maps, malloc implementation I finally made it. So here we stand today, JOS runs DOOM.

Kategorien
Coding Culture

Technology Radar #27: Automotive SW perspective

As written before, I really like the regular updates provided by Thoughtworks in their Technology Radar. My focus is on the applicability of techniques, tools, platforms and languages for automotive software, with a further focus on embedded in-car software. Hence, I am ignoring pure web-development and machine learning/data analytics stuff which usually makes a huge portion of the whole report. Recently, its volume 27 has been published. Let’s have a look!

As usual, lets start with a dive in the „Technologies“ sector and its „Adopt“ perimeter. The first entry we can find is about „path-to-production mapping„. Its as familiar as it sounds – many of my readers will have heard about the Value Stream Mapping or similar process mapping approaches. Thoughtworks state by themselves that this one is so obvious, still they didnt cover it in their reports yet. Sometimes, the simple ideas are the powerful ones. I can confirm from my own experience that a value stream map laying out all the process steps and inefficiencies in an easy to digest manner is a good eye opener and can help to focus on the real problems instead of beating around the bush.

Something very interesting for all the operating systems and platform plans in Automotive is the notion of an „incremental developer platform„. The underlying observation that „teams shooting for too much of that platform vision too fast“ is something I can confirm from own experience. Engineers love to develop sustainable platforms, but underestimate all the efforts required for it, and management with its impatience is further undermining platform plans. Following the book Team Topologies‘ concept of a „thinnest viable platform“ makes sense here. Not shooting too far in the first step, but also treating a platform product as an incremental endeavour.

Another one which strikes me is „observability in CI/CD pipelines„. With the increasing amount and complexity of CI/CD pipelines in one project, let alone a whole organization, many operational questions arise. And operations always benefit from clear data and overview. Recently, a then-student and now colleague and me designed and realized a tool which enables CI/CD monitoring for more than one repo, but for a graph of repos. I hope we can publish/open this project anytime soon.

In the platforms sector, Backstage entered the „adopt“ perimeter. The project is actively developing forward, and indeed could be an interesting tool for building an internal sw engineering community.

Looking at the tools sector, I liked Hadolint for finding common issues in Dockerfiles.

Kategorien
Coding Culture

Technology Radar #26: Automotive SW perspective

As written before, I really like the regular updates provided by Thoughtworks in their Technology Radar. Since the new version #26 was released a few weeks back, I found now the time to put down my notes. My focus is on the applicability of techniques, tools, platforms and languages for automotive software, with a further focus on embedded in-car software. Hence, I am ignoring pure web-development and machine learning/data analytics stuff which usually makes a huge portion of the whole report. Let’s go!

In the techniques section in the „adopt“ circle we initially have „single team remote wall“. In a nutshell I think they mean having a dashboard showing the essential data, kpis and tasks for a remote development team. I think the trick here is the „single“ as I assume that most remote teams have dashboards, however usually multiple ones loosely coupled. In my current team, our Scrum Master has created a great Jira dashboard showing some essential data which could give hints at the team’s performance.

The second noteworthy technique is „documentation quadrants“. Referring to documentation.divio.com/ this provides a nice taxonomy of different documentation types. This is very relatable, as I very often experience a fuzzy mixture of all those types scattered in many places. Certainly this is something I will bring to my work network’s attention.

Third, we have „rethinking remote standups“. This follows a general observation that conducting remote daily standups in the same duration and content like the were recommended in former times (e.g. the typical 15 min Scrum daily) does not provide the same amount of alignment within a development team. This is not necessarily because of the the meeting itself, but because other casual sync occasions during the day are happening less in remote setups. In the radar, its recommended to try an extension to one hour, and of course the goal is to decrease the overall meeting load by this. I am thorn on this one, as I was always a fan of crisp daily meetings, avoiding random rambling on topics concerning only parts of the team. Blocking 1 hour for everyone every day sounds like an overshoot approach.

Next there is again the „software bill of materials“ topic. This is currently a huge topic in the software industry, there have been very concrete examples recently (e.g. the Log4Shell or NPM package events you probably read about). Tool support to transparently and consistently managing the used software in a bigger project is really needed. While in the web and cloud business there is a growing number of tools, in the embedded world there are only some puzzle pieces. I can currently think of some Yocto support for this, however this covers only Linux parts in usually more complex multi-os automotive ECUs.

„Transitional Architecture“ sounds like a promising thing, even though the radar’s description stays a bit vague. Luckily there is an extensive article by Thoughtworks‘ Martin Fowler on this approach. In my opinion, managing legacy software in complex setups is one of the key challenges in the whole software industry, even more so in automotive embedded software, which is characterized by the co-existence of decades old technologies with state of the art approaches. Formalizing the transition from on older architecture to a newer makes sense, as usually this transition is often not architecturally covered as extensively as a the target architecture. This leads to misunderstandings, hacky workarounds and other unwanted side effects to a sustainable development.

Going one circle to the outside, in the „assess“ perimeter, we first find CUPID. Aimed at replacing the SOLID rules with instead a set of properties of „joyful code“, it has some interesting observations and paradigms. Currently I only skipped over it, I think this deserves more time and maybe a dedicated article. However, I can recommend to check out the well written original blogpost by Dan North.

In the „hold“ perimeter we see „miscellaneous platform teams“. In contrast to „platform engineering product teams“ described earlier in the radar, this is kind of a degradation form. If a platform team fails to define a clear product goal and identify its customers, usually the scope becomes (or is) very fuzzy, leading to a unclear platform system. Hence, its strongly recommended to avoid this by achieving clarity of what actually is the scope of the team.

In the platforms sector, I could only identify one relevant blip „GitLab CI/CD“. Recently I see a lot of discouragement of using Jenkins, and of course if you use already Gitlab for its other elements (code hosting, code review, issue tracking) you may as well use it for CI/CD pipelines. For sure its better integrated in the overall Gitlab experience. However its just another vendor-specific DSL, so I wonder if there will be practical standardization on the pipeline definition soon.

Looking at the „Tools“ sector, I found the reference to the two code search tools Comby and Sourcegraph. Besides offering code search and browsing using abstract syntax tree analysis, they are also offering semi-semantical batch changes, enabling „large scale changes“. Comby is an open source tool, while sourcegraph is commercial. I think I will try at least one of them soon.