Agent Experiment | Headtilt

Pre(r)amble

It’s hard to avoid all the AI infesting everything these days, but apart from using completions a bit in VSCode Copilot with the occasional conversation about different libraries (which I find a pretty pleasant way to learn about new things, even if those new things are occasionally BS) I haven’t really used it for code a lot.

I read Github’s breathless “now, with AGENTS” release a while back, and never really gave it much thought until they also put out their Copilot Adventures … tutorials? The word “tutorial” feels a bit off, since the activities are couched in terms of “you’re going to make X!” but you’re not really making anything - you’re just being given examples of prompt structure to get the agent to build the thing for you.

If I put my teacher hat on, it feels more like a professional learning session on classroom pedagogy - here are some ways you can break up your instructions in a way that students will understand. Of course, the agents aren’t students, and don’t learn (from you) so the analogy gets a bit thin. This is why I have such a hard time swallowing all the rhetoric about AI Agents just being junior software engineers.

Agent of Chaos

OK so the heading is a bit over the top, but we’ll run with it.

I had a go at a couple of the Copilot Adventures activities, and while I didn’t think they were amazing, they at least gave me a way to think about agents that wasn’t just automating my typing. I still don’t understand most of the AI agent boosterism… maybe one day.

Anyway, armed with some idea of how to prod the agent to make things for me took a couple of ESP32 modules with touch screens (more about what actually got made here) and started prompting away to get Claude in agent mode to write some student-friendly libraries for the display, the touch screen, and the (pretty neat) ESP-NOW radio.

A little while ago this paper on Measuring the Impact of Early 2025-AI on Experienced Open-Source Developer Productivity was doing the rounds. The TLDR was that developers predicted a reduction in development time by ~20% but instead development time increased by a similar amount.

My n=1 example project of my own pretty much bears that out. For three weeks or so I spent a couple of mornings here and there consciously avoiding writing code myself and just getting the agent to create the libraries, write demo code so that I could test it on hardware, write documentation etc.

Productivity boost!

The first couple of days felt super productive! So much code was being written! I could test features that I’d just described without having to write the code myself! Amazing! There were some hiccups, sure. There was a tendency to try and use the wrong API, so I had to write a small amount of code for RSSI reporting in the network library myself, and then shout at Claude to leave it alone because I could verify that it worked and it kept trying to “fix” it by returning a constant value instead.

Then we got to optimising. At first it felt really productive. I got to bounce ideas off Claude for how to improve draw speeds on the display, and because the AI is such a kiss-arse every idea I had was incredibly insightful - a game-changer! So the agent would re-write half the code, the display would still be slow, and I would have to slog through the code to figure out whether it was me being an idiot or the AI. Turns out it was a little of both. I learned a fair bit about what actually slows down microcontroller hardware, and Claude, well, Claude didn’t learn anything. It created “buffered write” solutions that created buffers but didn’t use them, duplicated code all over the place, wrote lots of demo code to demonstrate performance improvements that didn’t prove out, and on the whole wasted a lot of time rewriting things from scratch.

Eventually I spent at least as much effort as I would have expended writing all the code myself in coaxing Claude to fix things to a point where I was happy with the performance.

Squashing bugs and Change blindness

Debugging and testing was honestly one of the worst parts of the whole process, and I’m someone who typically enjoys debugging - the process of tracing and reasoning about code is really satisfying. Intentionally getting the agent to identify and fix bugs though? That’s a nightmare.

Every fix is a super confident “Oh I see exactly what the problem is! Perfect! All fixed” and then you get the set of diffs. The trouble is you get this discussion chain of adds and removes along the way with a light and fluffy “Now I’ll adjust the widget on the doodad [+10/-7]” without the details, and get lumped with the summary of changes at the end. You can step through the file change by change, but lose the context of why the line or lines were changed. So eventually (bearing in mind this is a hobby project) I just end up accepting them all and relying on testing to see if the problem is actually fixed.

This was exacerbated by the fact I was working with microcontrollers so a lot of the code required copying the updated file onto the device, restarting the device, and then manually testing, since mocked tests weren’t necessarily reflective of how the hardware really worked.

Eventually I ended up with change blindness on the diffs, such that when trying to fix an unrelated bug, I missed that the agent had reverted a previous fix to touch screen calibration. It then tried to do the same thing on two other separate occasions with screen calibration and also tried to revert my RSSI function. Bug fixing was super frustrating.

The good

All in all getting from zero to decent working libraries (well, working on the two available boards I have - hardware is hard) took more time than I anticipated, and a lot of it made me unreasonably angry at a computer program with a human name.

However, it wasn’t all bad. I went into the project using an agent because I don’t know a lot about interfacing with hardware, and it helped a lot with that. It also unintentionally helped my understanding of the peripherals and mechanisms involved because it did. so. much. dumb. shit. I had to learn about what the SPI calls did, I had to learn about how resistive touch screens are accessed and how you get at the coordinates, how display information is sent for updates, and so much more.

The agent also created a fair bit of pretty decent code. There were lots of demo programs that were created along the way that were great and honestly, I wouldn’t have had the patience to write them myself, or possibly even thought about them the same way (I did have to delete some stupid code examples from some of them though).

Results

I went into this wanting to understand a bit more about using agents in a project, and also to create something that wasn’t a toy like the examples in Copilot Adventures.

It’s easy to understand some of the public vibe coding disasters having seen the agent scrap large swathes of (working) code to implement a small feature that turns out to be broken. It’s also easy to understand how much working with an agent amplifies the feeling of “works on my machine” - typical LLM braggadocio aside, test scripts are written and run, and when they fail they are tweaked and run again until they pass. That means everything has been confirmed to work, right? If you’re not used to thinking about where the edges of your code are and what people might actually do with it, I guess you’re more inclined to trust the machine.

Overall, I think I could have written the whole set of libraries for my little project in about the same amount of time without the AI, and probably learned more about the underlying hardware in the process. It would have looked a lot more like my code, not had classes everywhere (so much OOP), and probably also not have had as many features, or as much exception handling. Also I feel it might not have been as easy to keep at it, since it was much faster to get to something working.

Pre(r)amble#

Agent of Chaos#

Productivity boost!#

Squashing bugs and Change blindness#

The good#

Results#

Pre(r)amble

Agent of Chaos

Productivity boost!

Squashing bugs and Change blindness

The good

Results