What I'm working on: Building a Workflow
Agentic Development from Code to Systems
This is a working summary of things I’ve been working on to build a workflow for agentic development over the last 6 months.
Agentic development will scale up all software development. I see two directions: further along a bad scale - big, clumsy teams that make poor software - or back toward a good scale of smaller, more capable teams. Those two paths map onto a split: open-loop workflows with no feedback, where the connection to stakeholders and mastery of the technology are lost, versus closed-loop workflows that converge by building understanding of both. If agentic development mainly accelerates the open-loop kind, that’s the bad direction.
So to “close the loop” I’m looking for “forcing functions” - constraints applied to guide development toward convergence. These can be informal or formal: a specification in human language, or in a formal modelling language. For software I prefer the informal, for systems the formal. The distinction comes from the ‘soft’ nature - software at the core needs ‘slack’. At the system level interfaces harden and should be locked down.
My two main areas of interest here are software development and systems modelling.
Software development
My main focus is structured workflows, such as pipelined processes and feedback loops, represented as agent skills.
My Claude Skills Library
I’ve focused on a few areas:
- Technical document creation
- Software development lifecycle
- A harness for evaluating skills
Technical Document Creation Skills
The key point about these skills is:
- Each skill uses a pipeline to refine and edit a technical document, such as a spec or design.
- The different skills represent different stages of document progress:
flesh-outconverts raw notes into a working document.review-stepsshould be a check, doing the clean-up.strong-editshould edit down to a final document.
I’ve described it in more detail here: Claude Beyond Code: Intent Expression and Agent Skills
The pipeline looks like this:

Software Development Lifecycle Skill
I think iterative development via error-correcting feedback loops is the best way to converge on a desired system. For a team an Agile workflow is a good harness for this iteration, but for agentic development I think the story flips.
- The traditional systems engineering V-model can provide the guide rails for an agent to work autonomously on the low-level implementation.
- To do that a human provides the spec of what the software should do at a high level.
- This is NOT waterfall! The point is discrete levels of abstraction that are documented or modelled, not discrete stages of development that are gated.
Some V-model development is not iterative, one aspect that is anti-iteration is stage gating. But for agentic development there is an inversion that means stage gating is not needed:
- Cost has inverted. Code is now cheaper to produce and verify than specifications.
- A specification may be cheaper to validate via an implementation loop that involves a stakeholder with a release increment.
- To iterate the loop we need a memory of the project and rails for the iteration, that is the V-model value.
- The expensive half is validation (does it match intent?), not verification (does it match the spec?) - scaffolding and tests make verification cheap. That asymmetry, cost(validation) ≫ cost(verification), is why the human owns the top of the V and the agent owns the bottom.
Removing stage gating ≠ removing release gating. We can have stronger release gating as we keep accumulated records of what the system does and why.
The idea is a human-dominated top of the V, and an agent-dominated bottom of the V, each with their own iterations driving correctness.
This general idea is similar to Spec-Driven Development (SDD): GitHub Spec Kit, AWS Kiro, OpenSpec, and more. However, the Light-V process is a generalization of a docs-as-code workflow I’ve been using for embedded systems for over 10 years. The main difference is that it is a portable document structure and an always-active agent discipline, not a CLI to adopt. It also pulls in the V-model traceability symmetry: every right-side artifact (test, validation) must verify a left-side one (design, architecture). It targets building new ideas from scratch, growing structure from nothing rather than retrofitting a spec onto an existing codebase.
| Dimension | Spec-Driven Development | Light-V |
|---|---|---|
| Form | A tool/CLI plus templates and slash commands | A portable doc structure + always-on agent discipline (no tooling) |
| Primary artifact | The natural-language spec; code is the build output | A hierarchy: paradigm → ADR → architecture → design → test → source |
| Flow | spec → plan → tasks → implement (largely linear) | Bidirectional gradient, explicitly not stage-gated |
| Verification | “Write testable requirements” | The right-side test verifies its left-side counterpart (design, architecture) |
| Validation | Implicit - the spec is assumed to capture intent | An explicit human feedback loop at the top of the V, does the result meet the original intent? |
| Decision capture | The living spec | Architecture Decision Records (ADRs) governed by a paradigm document |
| Orientation | Often retrofitting onto existing code | Greenfield - building new ideas from scratch |
Testing and Evaluating of Skills
- Example test report generator-coding skill test run
- End to end tests for skill : nakane1chome/claude-skills/tests/
- Framework for testing skills : nakane1chome/claude-skills/tests_fw/
Automated test & evaluation for Claude skills.
- Test the skill - confirm it does what it claims.
- Evaluate the skill - measure effectiveness. Add test points that depend on agent capability.
Generator (Meta-Coding) Skill
Agents are capable enough to solve meta-problems, that is, code that generates code:
- They often create scripts to make bulk changes instead of burning tokens on doing the work themselves.
- If there is a common data model shared between many source files, correctness can be improved by using a code generator, rather than having an agent code each unit individually.
- The skill encodes the pattern (Data Model → Parser → Helpers → Template → Output), and the rule: never edit generated output.
- One of the key points here is having the agent create a data model and a set of templates.
- The data model is a formal model that can act as a forcing function across the system.
- Code templates enhance code quality by reducing the variance in an agent’s output when it repeats the same activity.
System Modelling
These are projects focusing on system level development, using system models - rather than pure software.
System Modelling Language Translation and Query Model
This is the first project where I’ve applied the claude-skills/light-v-structure end-to-end and generated all code via an agent (Mostly Claude Opus 4.6 and 4.7 ).
Personal Project to create a common interface to various System Modelling Languages - like Pandoc translates between document formats, but for machine-readable system models.
- Scoped to system topology - what components exist, how they connect, and their properties - not behaviour (state machines, control flow, simulation).
- The idea is to create formal system models as a single source of truth across implementation & verification.
- These formal models should editable by agents, humans, or both. Concurrent editing should be possible.
- The editing should be traceable via an immutable fact model (an append-only graph of facts, never updated in place; each correction is a new fact referencing the ones it supersedes, preserving full history and provenance)
Initial Targets:
- Register Description: CMSIS-SVD (System View Description), SystemRDL (Register Description Language)
- Hardware Modelling: VHDL (VHSIC Hardware Description Language), structural subset
- System Modelling: AADL (Architecture Analysis & Design Language)
- General Modelling: SysMLv2 (Systems Modeling Language v2)
Implementation:
- Architecture and design via: claude-skills/light-v-structure
- All code & tests agent written, a combination of C++ and Python - grounded against human-owned use cases and third-party reference tools (existing parsers, round-trip conversion), not the agent’s own judgement.
- Model storage via an in-memory immutable fact store (looking for a better solution here)
- Fact type system inspired by OMG Meta Object Facility
- Model query interface based on datalog
What worked and what did not:
- First-order implementation worked: the agent implemented the parsers, generators, and their tests completely and correctly. Correctness was determined using third-party parsers to verify syntax, or round-trip conversion to verify equivalence.
- Most target languages have no C++ parsers or libraries - the domain is dominated by Java. Some pushing was needed to get the agent to realize that implementing a new parser from open-source grammars was low-cost; otherwise it would create incomplete implementations, or claim features were impossible because no C++ version was available.
- Much pushing was needed to get the agent to use consistent testing and code generation according to common guidelines.
- Second-order implementation was less successful: the agent could not successfully generalize the implementations to architectural concepts. It preferred spaghetti to architecture. The architecture uses an explicit set of abstractions across languages (the MOF layers M1, M2, M3); when asked to design classes to represent those generalizations, the agent created concrete, inflexible designs that overfit the use cases. I needed to give detailed instructions at this level - but the agent could still produce all the code.
Summary in Spec-Driven Development (SDD) terms - using the three levels of specification rigour:
- The implementation in the first-order (that is from design docs) was successfully completed with the Spec-as-Source model of development - the spec generates the code, the human never edits it.
- The implementation in the second-order (that is from architecture docs) was only completed with the Spec-Anchored model of development, although the agent was responsible for code editing.
Time Model for Distributed Simulation
This is being worked on as an exemplar for the systems-need-formal-models claim above: time is the first-order dimension software can’t absorb, so a simulator that must be correct in the time domain needs a formal model, not just libraries.
Personal Project to create a logical model of time in distributed simulation
- Logical model of how time as a first-order system property is communicated with a system constrained by real time and bandwidth.
- Models real, virtual, and sampled time domains; nodes and channels on a timestamped network; using parameters and constraints to derive synchronisation.
- Core idea: all time is independent, but across the system it can be constrained by a model aware of the system’s topology.
- Plan to model using SysMLv2 (or other language? TBD)
An ordinary simulator (a program that computes an approximation of a real-world system) will become easier to write with agentic development. But a simulator tied strictly to its specification, with well-defined behaviour in the time domain and full observability, stays hard without a formal model to constrain it. Unlike most software, a simulator must deal with system attributes such as time, and with the physical and logical layers of a system; software is inherently one-dimensional (it computes) and does not integrate cleanly with other first-order dimensions. Time, for example, is not well managed by software as software consumes time in its own operation, so it cannot separate the concern of time from compute.
Logical Model of Knowledge Understanding
Authored entirely by agents, this is the start of an attempt to build a logical architecture for understanding - the knowledge common to humans and AI. The premise is that a model of understanding can exist independent of any implementation. If it can, several things follow: we could define interfaces to understanding, test and characterize implementations against them, interface to other systems below the text layer (via formal models), and partition models along those interfaces.
Further reading
Software development lifecycle (Light-V)
- Architecture Decision Records - supporting (the decision-record practice the V uses)
- Demystifying evals for AI agents - supporting (how the skills are characterized)
Spec-Driven Development
- Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants - arXiv , lightweight overview.
- GitHub Spec Kit - reference SDD toolkit and CLI
- How to write a good spec for AI agents - Addy Osmani
- What Is Spec-Driven Development? - Augment Code guide
- The Limits of Spec-Driven Development - Isoform, on SDD’s failure modes
Agentic software engineering & feedback loops
- LLM-driven feedback loops - EmergentMind overview
- LLM-Based Agentic Systems for Software Engineering - survey (arXiv)
Formal modelling with AI
- SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems - arXiv
- The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap - arXiv
- SysTemp: A Multi-Agent System for Template-Based Generation of SysML v2 - arXiv
System Modelling
- Eclipse EMF / Ecore modelling framework - Traditional approach - highly coupled to eclipse, in memory graph.
- The Platonic Representation Hypothesis - Alternate ML-based translation, the hedge against rule-based
- Jinaga - immutable fact architecture - an append-only fact paradigm for local first development
- Datomic Information Model - immutable facts queried with Datalog
- Categorical Query Language (CQL) - a functorial alternative to a fact graph
Simulation Time modelling
- SysML v2: an overview - Intro to SysMLv2
- Accellera Federated Simulation Standard - a standard for federating simulators with a TBD time model
- Distributed Co-Simulation Protocol (DCP) - a vendor-neutral co-simulation standard with two implicit time models
- Vector clocks - A well established logical model of distributed time
Knowledge understanding (Dimensions)
- Osgood’s semantic differential - Lower dimensions
- BERT: Deep Bidirectional Transformers - Middle dimensions
- On the Dangers of Stochastic Parrots - Anti-thesis - The opposite of what I’m trying to build.