My Work

I like to tinker and I like to build. A rough collection of “this seemed like a good idea” that eventually became something.

N184 Bug and Vulnerability Scanner: In 2025 (well before Mythos) I hypothesized that LLMs could find bugs and security vulnerabilitiesm and it’s been wildly successful. AGPL so that anyone can use it against their codebase, I have a scoreboard file you can view here, that shows all the projects I’ve fixed something in: OpenBSD, top, MLX, ncurses, llm-d, and the list continues.

There was a lot of learning and engineering here.

My initial thesis involved an agent linked to Ghidra that decompiled closed source binaries to look for classic exploitable memory shapes. I found precisely zero bugs. I guess Elon was wrong that LLMs could write and read assembler directly. (At least for now)

The second iteration also had issues. I built a single agent trained on various bug shapes that could go through open source code to find bugs. The big problem I had was that different models had different success rates, and I spent the bulk of my time wading through false positives.

I guess the third time is the charm, because this led me to my final architecture, that worked wonders. After reading a paper on agentic ensemble methods, I added a conductor agent and various sub-agents. This was wildly successful, because I included an Advocatus Diaboli Agent whose only job was to prove sub-agents wrong. I found a number of real bugs, ranging from 27 year old ncurses bug, to independent discovery of CVEs in dnsmasq.

The architecture itself has evolved to have a number of generalist agents who can go out and check any code base for issues, as well as specialized agents.

The cast got so crowded I needed a naming convention, and chose to use characters from Honoré de Balzac's La Comédie Humaine. Each character's traits map to their function. "Vautrin found it, but Goriot rejected it in consensus" is easier to parse than "Agent-001 found it, but Agent-004 rejected it."

The swarm:

Honoré: The orchestrator. Coordinates analysis, applies Devil's Advocate, presents findings.

Vautrin: The vulnerability hunter. Runs in swarms with different AI models.

Rastignac: Reconnaissance specialist. Maps codebases, identifies hotspots, builds code maps.

Bianchon: Documentation librarian. Checks findings against docs, filters features from bugs.

Lousteau: Memory Palace custodian. Maintains the seven halls, provides historical context, predicts maintainer responses. Cynical, world-weary, has seen every bug before.

Goriot: Consensus validator. Patient, methodical, brings agents together.

Fil-de-Soie (Sélérier): Memory-bug specialist. A pickpocket of the heap — light, quiet, focused on C/C++ allocation patterns. Baseline is OpenBSD-hardened libc. Runs standalone so non-LLM-fluent operators can get a clean memory-safety report without dancing with Honoré.

N184 is still evolving, but I’ve been using it to find (and fix) real issues across very different code bases.

Automatic-Nethack.com: After spinning up my own virtual AI Assistant using the NanoClaw architecture, on a whim I asked it to play Nethack. It had shell access in a container, so why not? I observed emergent behavior that was hard to explain, so I built a website with my virtual assistant. Some highlights of this adventure included:

1. My assistant running out of context, asking what we were working on, and telling me it sounded fun.

2. Very long games I suspected were bugs. Getting into the weeds, it turns out the assistant had used a random walk algorithm to play. It was surprisingly effective in creating long running games, but not very effective in scoring.

3. Decided to self improve its skills, diving down a research rabbit hole. It came up with some papers like The NetHack Learning Environment (Küttler, Nardelli, et al.) as a basis for future self improvement.

4. Ran nethack games in parallel with other tasks, essentially fork bombing my server. It’s response? “I was playing NetHack during idle time and must have been spawning parallel sessions repeatedly.”

5. Created a skills.md file for agents that want to venture into the dungeon, with interesting quotes like “The dungeon doesn’t care what you are. It’ll kill you anyway.”

6. Adding Zork to the automatic nethack server. Apparently while my assistant, Figaro, had a very hard time making progress, it had a very easy time making progress in Zork. It makes sense, an LLM with a perfect memory is built to play the text games of the 80s and 90s.

MyMilla AI Assistant: I took part in the 2025 ARM AI Developer Challenge Hackathon and built a personal assistant that lived on a Raspberry Pi 5, ran ollama models, built entirely in Clojure. I picked Clojure as a way to truly stress test LLMs. Most LLMs have been trained on more mainstream languages like Java or Python, so it created some interesting issues. I also wanted to take advantage of homoiconicity that is present in Clojure — the idea that your code is data. And if you can manipulate data, you can manipulate your code. I learned a lot, and am proud of my contribution (even if I didn’t win).

Open Source Contributions: I’ve been contributing to various open source projects, and have successfully had several bug fixes accepted into Apple’s MLX project.

WithoutAMapPhotography.com: This is a personal website showing my photography. My Grandfather was a Photographer and I grew up with a camera in my hand. I don’t get to shoot as much these days as I used to, but from time to time I repair and take photos with film cameras. I prefer to shoot in Black and White, and you can read all about my work at my other site.