<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Phil @ ShinCBM</title>
    <description>Embedded systems, low-level code, simulation and agentic workflows.</description>
    <link>https://www.shincbm.com/</link>
    <atom:link href="https://www.shincbm.com/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 28 Jun 2026 09:57:39 +0000</pubDate>
    <lastBuildDate>Sun, 28 Jun 2026 09:57:39 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      
      <item>
        <title>What I&apos;m working on: Building a Workflow</title>
        <description>&lt;p&gt;This is a working summary of things I’ve been working on to build a workflow for &lt;strong&gt;agentic development&lt;/strong&gt; over the last 6 months.&lt;/p&gt;

&lt;p&gt;Agentic development will scale up all software development. I see two directions: further along a bad scale (big, clumsy teams that make poor software) or back toward a good scale of smaller, more capable teams. I feel the two paths map onto a split: bad open-loop workflows with no feedback, where the connection to stakeholders and mastery of the technology are lost, versus good closed-loop workflows that converge by building understanding of both.&lt;/p&gt;

&lt;p&gt;So to “close the loop” I’m looking for “forcing functions”, constraints applied to guide development toward convergence. These can be informal or formal: a specification in human language, or in a formal modelling language. For software I prefer structured but informal, for systems a formal system description. The distinction comes as software at the core needs ‘slack’, but at a system level interfaces harden and should be locked down.&lt;/p&gt;

&lt;p&gt;My two main areas of interest here are &lt;strong&gt;software development&lt;/strong&gt; and &lt;strong&gt;systems modelling&lt;/strong&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;software-development&quot;&gt;Software development&lt;/h2&gt;

&lt;p&gt;My main focus is structured workflows, such as pipelined processes and feedback loops, represented as agent skills.&lt;/p&gt;

&lt;h3 id=&quot;my-claude-skills-library&quot;&gt;My Claude Skills Library&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/&quot;&gt;nakane1chome/claude-skills&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ve used agent skills to codify workflows during development, I’ve focused on a few areas:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Technical document creation&lt;/li&gt;
  &lt;li&gt;Software development lifecycle&lt;/li&gt;
  &lt;li&gt;A harness for evaluating skills&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;technical-document-creation-skills&quot;&gt;Technical Document Creation Skills&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/main/skills/flesh-out&quot;&gt;claude-skills/flesh-out&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/main/skills/review-steps&quot;&gt;claude-skills/review-steps&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/main/skills/strong-edit&quot;&gt;claude-skills/strong-edit&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a set of skills that help pull ideas from my brain into text. The key point about these skills is:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Each skill uses a pipeline to refine and edit a technical document, such as a spec or design.&lt;/li&gt;
  &lt;li&gt;The different skills represent different stages of document progress:
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flesh-out&lt;/code&gt; converts raw notes into a working document.&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;review-steps&lt;/code&gt; should be a check, doing the clean-up.&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strong-edit&lt;/code&gt; should edit down to a final document.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ve described it in more detail here: &lt;a href=&quot;https://www.shincbm.com/agentic-code/2026/02/15/claude-skill-document-review.html&quot;&gt;Claude Beyond Code: Intent Expression and Agent Skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pipeline looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/medium-export/claude-skill-document-review-mermaid-1.png&quot; alt=&quot;image&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;software-development-lifecycle-skill&quot;&gt;Software Development Lifecycle Skill&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/light-v-structure/skills/light-v-structure&quot;&gt;claude-skills/light-v-structure&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think iterative development via error-correcting feedback loops is the best way to converge on a desired system. For a team an Agile workflow with a tight code-release-evaluate-iterate cycle is a good harness for convergence, but for agentic development I think the story changes.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The traditional &lt;strong&gt;systems engineering V-model&lt;/strong&gt; can provide the harness for an agent to work autonomously on the &lt;strong&gt;low-level&lt;/strong&gt; implementation.&lt;/li&gt;
  &lt;li&gt;To enable that a human needs to work at a &lt;strong&gt;high level&lt;/strong&gt; and provide the spec of what the software should do.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;This is NOT waterfall!&lt;/strong&gt; The point is &lt;strong&gt;discrete levels of abstraction&lt;/strong&gt; that are &lt;strong&gt;documented or modelled&lt;/strong&gt;, not discrete stages of development that are gated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some V-model development practices are not iterative, for example an anti-iteration practice is &lt;strong&gt;stage gating&lt;/strong&gt;. For agentic development there is an inversion that means &lt;strong&gt;stage gating&lt;/strong&gt; is not needed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Costs have inverted&lt;/strong&gt;. Code is now cheaper to produce and verify than specifications.&lt;/li&gt;
  &lt;li&gt;A specification may be &lt;strong&gt;cheaper to validate via an autonomous implementation loop&lt;/strong&gt;, and use an Agile style release increment to collect stakeholder feedback.&lt;/li&gt;
  &lt;li&gt;But to iterate the full loop we need a &lt;strong&gt;memory of the project&lt;/strong&gt; and &lt;strong&gt;rails for the iteration&lt;/strong&gt;, this is now is the value of the V-model.&lt;/li&gt;
  &lt;li&gt;The cost of validation  (does it match intent?) is hard to reduce, but the cost of verification (does it match the spec?) can be reduced by scaffolding and tests. That asymmetry, cost(validation) ≫ cost(verification), is why the human owns the top of the V and the agent owns the bottom.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NOTE: Removing stage gating ≠ removing &lt;em&gt;release&lt;/em&gt; gating. On the contary, we can have stronger release gating as we keep accumulated records of what the system does and why.&lt;/p&gt;

&lt;p&gt;The idea is a human-dominated top of the V, and an agent-dominated bottom of the V, each with their own iterations driving correctness.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/light-v/light-v-model.drawio.svg&quot; alt=&quot;Image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This general idea is similar to &lt;strong&gt;&lt;a href=&quot;https://thebcms.com/blog/spec-driven-development&quot;&gt;Spec-Driven Development&lt;/a&gt;&lt;/strong&gt; (SDD): &lt;a href=&quot;https://github.com/github/spec-kit&quot;&gt;GitHub Spec Kit&lt;/a&gt;, &lt;a href=&quot;https://kiro.dev/&quot;&gt;AWS Kiro&lt;/a&gt;, &lt;a href=&quot;https://github.com/Fission-AI/OpenSpec&quot;&gt;OpenSpec&lt;/a&gt;, and more. However, the Light-V process is a generalization of a &lt;a href=&quot;https://www.writethedocs.org/guide/docs-as-code/&quot;&gt;docs-as-code&lt;/a&gt; workflow I’ve been using for embedded systems for over 10 years. The main difference is that it is a portable document structure and an agent use discipline, not a tool to adopt. It also pulls in the V-model traceability symmetry: every right-side artifact (test, validation) must verify a left-side one (design, architecture). It targets &lt;strong&gt;building new ideas from scratch&lt;/strong&gt;, growing structure from nothing rather than retrofitting a spec onto an existing codebase.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Dimension&lt;/th&gt;
      &lt;th&gt;Spec-Driven Development&lt;/th&gt;
      &lt;th&gt;Light-V&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Form&lt;/td&gt;
      &lt;td&gt;A tool/CLI plus templates and slash commands&lt;/td&gt;
      &lt;td&gt;A portable doc structure + agent use discipline (no tooling)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Primary artifact&lt;/td&gt;
      &lt;td&gt;The natural-language spec (including semi-formal like &lt;a href=&quot;https://www.iaria.org/conferences2013/filesICCGI13/ICCGI_2013_Tutorial_Terzakis.pdf&quot;&gt;EARS&lt;/a&gt; or &lt;a href=&quot;https://cucumber.io/docs/gherkin/reference&quot;&gt;Gherkin&lt;/a&gt;); code via spec-as-source, or spec-anchored code and update&lt;/td&gt;
      &lt;td&gt;A hierarchy of natural-language docs:  code via spec-as-source from design, and spec-anchored for higher layers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Flow&lt;/td&gt;
      &lt;td&gt;spec → plan → tasks → implement (largely linear)&lt;/td&gt;
      &lt;td&gt;paradigm → ADR → architecture → design → source → test → validate (iterate)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Verification&lt;/td&gt;
      &lt;td&gt;“Write testable requirements”&lt;/td&gt;
      &lt;td&gt;The right-side test verifies its left-side counterpart (design to unit tests, architecture to system tests)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Validation&lt;/td&gt;
      &lt;td&gt;Implicit - the spec is assumed to capture intent&lt;/td&gt;
      &lt;td&gt;An explicit human feedback loop at the top of the V, does the result meet the original intent?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Decision capture&lt;/td&gt;
      &lt;td&gt;The living spec&lt;/td&gt;
      &lt;td&gt;Architecture Decision Records (ADRs) governed by a paradigm document&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Orientation&lt;/td&gt;
      &lt;td&gt;Often retrofitting onto existing code&lt;/td&gt;
      &lt;td&gt;Greenfield - building new ideas from scratch&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;testing-and-evaluating-of-skills&quot;&gt;Testing and Evaluating of Skills&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Example test report &lt;a href=&quot;https://www.shincbm.com/claude-skills/runs/23944013461/strongest/skills-generator-coding-library_generator-with_skill-strongest.html&quot;&gt;generator-coding skill test run&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;End to end tests for skill : &lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/main/tests&quot;&gt;nakane1chome/claude-skills/tests/&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Framework for testing skills : &lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/main/test_fw&quot;&gt;nakane1chome/claude-skills/tests_fw/&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I’m trying to do here is automated test &amp;amp; evaluation for Claude skills.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Test&lt;/strong&gt; the skill - confirm it does what it claims.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Evaluate&lt;/strong&gt; the skill - measure effectiveness. Add test points that depend on agent capability.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;CI Test&lt;/strong&gt; the skill - ensure each update maintains effective.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;generator-meta-coding-skill&quot;&gt;Generator (Meta-Coding) Skill&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/generator-coding/SKILL.md&quot;&gt;claude-skills/generator-coding&lt;/a&gt;.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents are capable enough to solve meta-problems, that is, code that generates code:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;They often create scripts to make bulk changes instead of burning tokens on doing the work themselves.&lt;/li&gt;
  &lt;li&gt;If there is a common data model shared between many source files, correctness can be improved by using a code generator, rather than having an agent code each unit individually.&lt;/li&gt;
  &lt;li&gt;The skill encodes the pattern (Data Model → Parser → Helpers → Template → Output), and the rule: never edit generated output.
    &lt;ul&gt;
      &lt;li&gt;One of the key points here is having the agent create a data model and a set of templates.&lt;/li&gt;
      &lt;li&gt;The data model is a formal model that can act as a forcing function across the system.&lt;/li&gt;
      &lt;li&gt;Code templates enhance code quality by reducing the variance in an agent’s output when it repeats the same activity.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;system-modelling&quot;&gt;System Modelling&lt;/h2&gt;

&lt;p&gt;These are projects focusing on system level development, using system models - rather than pure software.&lt;/p&gt;

&lt;h3 id=&quot;system-modelling-language-translation-and-query-model&quot;&gt;System Modelling Language Translation and Query Model&lt;/h3&gt;

&lt;p&gt;This is the first project where I’ve applied the &lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/light-v-structure/skills/light-v-structure&quot;&gt;claude-skills/light-v-structure&lt;/a&gt;&lt;/strong&gt; end-to-end and generated all code via an agent (Mostly Claude Opus 4.6 and 4.7 ).&lt;/p&gt;

&lt;!-- Reference: ../../pan-sys/ --&gt;

&lt;p&gt;This is a personal project to create a common interface to various System Modelling Languages - like &lt;a href=&quot;https://pandoc.org/&quot;&gt;Pandoc&lt;/a&gt; translates between document formats, but for machine-readable system models.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Scoped to system topology - what components exist, how they connect, and their properties - not behaviour (state machines, control flow, simulation).&lt;/li&gt;
  &lt;li&gt;The idea is to create formal system models as a single source of truth across implementation &amp;amp; verification.&lt;/li&gt;
  &lt;li&gt;These formal models should editable by agents, humans, or both. Concurrent editing should be possible.&lt;/li&gt;
  &lt;li&gt;The editing should be traceable via an immutable &lt;a href=&quot;https://en.wikipedia.org/wiki/Deductive_database&quot;&gt;fact model&lt;/a&gt; (an append-only graph of facts, never updated in place; each correction is a new fact referencing the ones it supersedes, preserving full history and provenance)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initial Targets focus on embedded systems:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Register Description: &lt;a href=&quot;https://www.keil.com/pack/doc/CMSIS/SVD/html/index.html&quot;&gt;CMSIS-SVD&lt;/a&gt; (System View Description), &lt;a href=&quot;https://www.accellera.org/downloads/standards/systemrdl&quot;&gt;SystemRDL&lt;/a&gt; (Register Description Language)&lt;/li&gt;
  &lt;li&gt;Hardware Modelling: &lt;a href=&quot;https://en.wikipedia.org/wiki/VHDL&quot;&gt;VHDL&lt;/a&gt; (VHSIC Hardware Description Language), structural subset&lt;/li&gt;
  &lt;li&gt;System Modelling: &lt;a href=&quot;https://en.wikipedia.org/wiki/Architecture_Analysis_%26_Design_Language&quot;&gt;AADL&lt;/a&gt; (Architecture Analysis &amp;amp; Design Language)&lt;/li&gt;
  &lt;li&gt;General Modelling: &lt;a href=&quot;https://github.com/Systems-Modeling/SysML-v2-Release&quot;&gt;SysMLv2&lt;/a&gt; (Systems Modeling Language v2)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Implementation:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Architecture and design via: &lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/tree/light-v-structure/skills/light-v-structure&quot;&gt;claude-skills/light-v-structure&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;All code &amp;amp; tests autonomously agent written, a combination of C++ and Python.&lt;/li&gt;
  &lt;li&gt;Testing grounded against human-owned use cases and third-party reference tools (existing parsers, round-trip conversion).&lt;/li&gt;
  &lt;li&gt;Model storage via an in-memory immutable fact store (looking for a better solution here)&lt;/li&gt;
  &lt;li&gt;Fact type system inspired by &lt;a href=&quot;https://en.wikipedia.org/wiki/Meta-Object_Facility&quot;&gt;OMG Meta Object Facility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Model query interface based on &lt;a href=&quot;https://en.wikipedia.org/wiki/Datalog&quot;&gt;datalog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What worked and what did not:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;First-order implementation worked: the agent implemented the parsers, generators, and their tests completely and correctly. Correctness was determined using third-party parsers to verify syntax, or round-trip conversion to verify equivalence.&lt;/li&gt;
  &lt;li&gt;Most target languages have no C++ parsers or libraries - the domain is dominated by Java. Some pushing was needed to get the agent to realize that implementing a new parser from open-source grammars was low-cost; otherwise it would create incomplete implementations, or claim features were impossible because no C++ version was available.&lt;/li&gt;
  &lt;li&gt;Much pushing was needed to get the agent to use consistent testing and code generation according to common guidelines.&lt;/li&gt;
  &lt;li&gt;Second-order implementation was less successful: the agent could not successfully generalize the implementations to architectural concepts. It preferred spaghetti to architecture. The architecture uses an explicit set of abstractions across languages (the MOF layers M1, M2, M3); when asked to design classes to represent those generalizations, the agent created concrete, inflexible designs that overfit the use cases. I needed to give detailed instructions at this level - but the agent could still produce all the code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Summary in Spec-Driven Development (SDD) terms - using the &lt;a href=&quot;https://www.rushis.com/spec-first-spec-anchored-spec-as-truth-the-three-levels-of-spec-driven-development/&quot;&gt;three levels of specification rigour&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The implementation in the first-order (that is from design docs) was successfully completed with the &lt;strong&gt;Spec-as-Source&lt;/strong&gt; model of development - the spec generates the code, the human never edits it.&lt;/li&gt;
  &lt;li&gt;The implementation in the second-order (that is from architecture docs) was only completed with the &lt;strong&gt;Spec-Anchored&lt;/strong&gt; model of development, although the agent was responsible for code editing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;time-model-for-distributed-simulation&quot;&gt;Time Model for Distributed Simulation&lt;/h3&gt;

&lt;!-- Reference: ../../sim-time-model/ --&gt;

&lt;p&gt;This is a personal project being worked on as an exemplar for the systems-need-formal-models workflow: time a first-order dimension software can’t absorb, so a simulator that must be correct in the time domain needs a formal model, not just libraries.&lt;/p&gt;

&lt;p&gt;The key idea is to create a logical model, formally specified, of time in distributed simulation&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Logical model of how time as a first-order system property is communicated with a system constrained by real time and bandwidth.&lt;/li&gt;
  &lt;li&gt;Models real, virtual, and sampled time domains; nodes and channels on a timestamped network; using parameters and constraints to derive synchronisation.&lt;/li&gt;
  &lt;li&gt;Core idea: all time is independent, but across the system it can be constrained by a model aware of the system’s topology.&lt;/li&gt;
  &lt;li&gt;Plan to model using SysMLv2 (or other language? TBD)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I expect an ordinary simulator (a program that computes an approximation of a real-world system) will become easier to write with agentic development. But writing a simulator with well-defined behaviour in the time domain and full observability will stay hard without a formal model to constrain it. Unlike most software, a simulator must deal with system attributes such as time, and with the physical and logical layers of a system. For example, time is not well managed by software as software consumes time in its own operation, so it cannot separate the concern of time from compute.&lt;/p&gt;

&lt;h3 id=&quot;logical-model-of-knowledge-understanding&quot;&gt;Logical Model of Knowledge Understanding&lt;/h3&gt;

&lt;!-- Reference: ../../bended-thevenin-ground-plane/ --&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;/assets/dimensions_of_understanding.html&quot;&gt;Dimensions of Understanding&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is an informal model authored entirely by agents, this is the start of an attempt to build a logical architecture for understanding the knowledge structure common to humans and AI. The premise is that a model of understanding can exist independent of any implementation. If it can, several things follow: we could define interfaces to understanding, test and characterize implementations against them, interface to other systems below the text layer (via formal models), and partition models along those interfaces.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Software development lifecycle (Light-V)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/joelparkerhenderson/architecture-decision-record/&quot;&gt;Architecture Decision Records&lt;/a&gt; - supporting (the decision-record practice the V uses)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents&quot;&gt;Demystifying evals for AI agents&lt;/a&gt; - supporting (how the skills are characterized)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Spec-Driven Development&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2602.00180&quot;&gt;Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants&lt;/a&gt; - arXiv , lightweight overview.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/github/spec-kit&quot;&gt;GitHub Spec Kit&lt;/a&gt; - reference SDD toolkit and CLI&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://addyosmani.com/blog/good-spec/&quot;&gt;How to write a good spec for AI agents&lt;/a&gt; - Addy Osmani&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.augmentcode.com/guides/what-is-spec-driven-development&quot;&gt;What Is Spec-Driven Development?&lt;/a&gt; - Augment Code guide&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://isoform.ai/blog/the-limits-of-spec-driven-development&quot;&gt;The Limits of Spec-Driven Development&lt;/a&gt; - Isoform, on SDD’s failure modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Agentic software engineering &amp;amp; feedback loops&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.emergentmind.com/topics/llm-driven-feedback-loops&quot;&gt;LLM-driven feedback loops&lt;/a&gt; - EmergentMind overview&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2601.09822&quot;&gt;LLM-Based Agentic Systems for Software Engineering&lt;/a&gt; - survey (arXiv)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Formal modelling with AI&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2509.23130&quot;&gt;SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems&lt;/a&gt; - arXiv&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2412.06512&quot;&gt;The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap&lt;/a&gt; - arXiv&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2506.21608&quot;&gt;SysTemp: A Multi-Agent System for Template-Based Generation of SysML v2&lt;/a&gt; - arXiv&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;System Modelling&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://eclipse.dev/modeling/emf/&quot;&gt;Eclipse EMF / Ecore modelling framework&lt;/a&gt; - Traditional approach - highly coupled to eclipse, in memory graph.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/html/2505.12540&quot;&gt;The Platonic Representation Hypothesis&lt;/a&gt; - Alternate ML-based translation, the hedge against rule-based&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://jinaga.com/&quot;&gt;Jinaga - immutable fact architecture&lt;/a&gt; - an append-only fact paradigm for local first development&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.datomic.com/datomic-overview.html#information-model&quot;&gt;Datomic Information Model&lt;/a&gt; - immutable facts queried with Datalog&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.categoricaldata.net/&quot;&gt;Categorical Query Language (CQL)&lt;/a&gt; - a functorial alternative to a fact graph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Simulation Time modelling&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://dvcon-proceedings.org/wp-content/uploads/SysML-v2-An-Overview-with-SysMD-Demonstration.pdf&quot;&gt;SysML v2: an overview&lt;/a&gt; - Intro to SysMLv2&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.accellera.org/activities/working-groups/fss-working-group&quot;&gt;Accellera Federated Simulation Standard&lt;/a&gt; - a standard for federating simulators with a TBD time model&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dcp-standard.org/&quot;&gt;Distributed Co-Simulation Protocol (DCP)&lt;/a&gt; - a vendor-neutral co-simulation standard with two implicit time models&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Vector_clock&quot;&gt;Vector clocks&lt;/a&gt; - A well established logical model of distributed time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Knowledge understanding (Dimensions)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Semantic_differential&quot;&gt;Osgood’s semantic differential&lt;/a&gt; - Lower dimensions&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1810.04805&quot;&gt;BERT: Deep Bidirectional Transformers&lt;/a&gt; - Middle dimensions&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/3442188.3445922&quot;&gt;On the Dangers of Stochastic Parrots&lt;/a&gt; - Anti-thesis - The opposite of what I’m trying to build.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate>
        <link>https://www.shincbm.com/agentic-code/2026/06/26/building-a-workflow.html</link>
        <guid isPermaLink="true">https://www.shincbm.com/agentic-code/2026/06/26/building-a-workflow.html</guid>
        
        <category>agentic-coding</category>
        
        <category>claude-code</category>
        
        <category>systems-engineering</category>
        
        <category>mbse</category>
        
        <category>embedded</category>
        
        <category>llm-theory</category>
        
        
        <category>agentic-code</category>
        
      </item>
      
    
      
    
      
    
      
    
      
    
      
      <item>
        <title>Claude Beyond Code: Intent Expression and Agent Skills</title>
        <description>&lt;p&gt;I feel like productive developers have often resisted documentation: it interrupts the flow of getting &lt;strong&gt;code&lt;/strong&gt; out of your head and into the editor. Interestingly, my recent experience of agentic coding is flipping this. When an agent writes the &lt;strong&gt;code&lt;/strong&gt;, the bottleneck moves upstream to: “how clearly can I express the &lt;strong&gt;idea&lt;/strong&gt;?” Documentation isn’t the interruption anymore, it’s leverage over the agent.&lt;/p&gt;

&lt;p&gt;This post introduces a set of skills for agentic document elicitation and review. The skills are reusable instruction files for &lt;a href=&quot;https://docs.anthropic.com/en/docs/claude-code&quot;&gt;Claude Code&lt;/a&gt;, covering document creation and review. First, I’d like to present why I’ve created them.&lt;/p&gt;

&lt;h2 id=&quot;the-leverage-shift&quot;&gt;The leverage shift&lt;/h2&gt;

&lt;p&gt;The traditional developer’s fulcrum has been &lt;em&gt;mental clarity&lt;/em&gt;: keep the system &lt;strong&gt;idea&lt;/strong&gt; in your head, design and refine in place. The lever moves well-formed &lt;strong&gt;code&lt;/strong&gt; from brain to editor at speed.&lt;/p&gt;

&lt;p&gt;In that model &lt;strong&gt;documentation was an interruption&lt;/strong&gt;, at best a necessary evil. Interrupting that flow to &lt;em&gt;“document it”&lt;/em&gt; was disruptive. You did it because collaboration required it, not because it helped you think.&lt;/p&gt;

&lt;p&gt;But there is a trap here. A productive developer who tries to slot an agent into their existing &lt;em&gt;mental-clarity&lt;/em&gt; workflow may actually &lt;a href=&quot;https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/&quot;&gt;lose productivity&lt;/a&gt;. New leverage doesn’t come from keeping the &lt;strong&gt;idea&lt;/strong&gt; in your head and using the agent as a faster typist. It comes from changing the workflow. Going back to documentation, this is a way to share the &lt;strong&gt;idea&lt;/strong&gt; directly.&lt;/p&gt;

&lt;p&gt;In the new model, the agentic developer’s fulcrum is &lt;em&gt;agentic clarity&lt;/em&gt;: externalize the &lt;strong&gt;idea&lt;/strong&gt; so completely that the lever moves &lt;strong&gt;code&lt;/strong&gt; from agent to codebase without pulling a human back into the loop.&lt;/p&gt;

&lt;p&gt;Here, &lt;strong&gt;documentation can be an enabler&lt;/strong&gt;, or at least that’s the argument I’ll make. When an agent handles the &lt;strong&gt;code&lt;/strong&gt;, human language becomes the input, not the afterthought.&lt;/p&gt;

&lt;p&gt;Plan mode in agentic coding tools is one path to &lt;em&gt;agentic clarity&lt;/em&gt;, but plans are one-off. A living document in the workspace gives both agents and humans a shared, versioned reference. Iterative prompting works for contained tasks, but anything that persists (an API consumed by multiple agents, a design revisited across sprints) benefits from a document under configuration management rather than a conversation that evaporates. The development lifecycle sees the same &lt;strong&gt;ideas&lt;/strong&gt; surface repeatedly.&lt;/p&gt;

&lt;h2 id=&quot;a-structured-review-skill&quot;&gt;A Structured Review Skill&lt;/h2&gt;

&lt;p&gt;Firstly, this approach relies on &lt;a href=&quot;https://www.writethedocs.org/guide/docs-as-code/&quot;&gt;docs as code&lt;/a&gt;: maintaining documentation as an artifact alongside source code. You can port the skills to a different workflow, but in my opinion that reduces leverage and adds friction.&lt;/p&gt;

&lt;p&gt;At every stage of the development lifecycle, ask: what is the source of information? Is it the developer’s novel &lt;strong&gt;idea&lt;/strong&gt;, or can the agent infer it from standard approaches? With three skills I form a pipeline to capture the relevant information and filter the noise:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Skill&lt;/th&gt;
      &lt;th&gt;Action&lt;/th&gt;
      &lt;th&gt;Information&lt;/th&gt;
      &lt;th&gt;Noise&lt;/th&gt;
      &lt;th&gt;Refinement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/flesh-out/SKILL.md&quot;&gt;flesh-out&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;Generate&lt;/td&gt;
      &lt;td&gt;Expand&lt;/td&gt;
      &lt;td&gt;Adds (controlled)&lt;/td&gt;
      &lt;td&gt;Raw ore → shaped material&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/review-steps/SKILL.md&quot;&gt;review-steps&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;Polish&lt;/td&gt;
      &lt;td&gt;Preserve&lt;/td&gt;
      &lt;td&gt;Reduce&lt;/td&gt;
      &lt;td&gt;Remove impurities&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/strong-edit/SKILL.md&quot;&gt;strong-edit&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;Critique&lt;/td&gt;
      &lt;td&gt;Enhance&lt;/td&gt;
      &lt;td&gt;Reduce&lt;/td&gt;
      &lt;td&gt;Stress-test the structure&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;A typical pipeline runs &lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/flesh-out/SKILL.md&quot;&gt;flesh-out&lt;/a&gt;&lt;/strong&gt; → &lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/review-steps/SKILL.md&quot;&gt;review-steps&lt;/a&gt;&lt;/strong&gt; → &lt;strong&gt;&lt;a href=&quot;https://github.com/nakane1chome/claude-skills/blob/main/skills/strong-edit/SKILL.md&quot;&gt;strong-edit&lt;/a&gt;&lt;/strong&gt;, but the skills can be applied in any order; each one detects what the document needs. All three are broken into numbered stages with developer checkpoints. The agent stops after each stage and waits for approval before proceeding.&lt;/p&gt;

&lt;pre class=&quot;mermaid&quot;&gt;
graph LR
    raw[Raw notes] --&amp;gt; FO

    subgraph FO [&quot;/flesh-out&quot;]
        fo0(Stage 0&lt;br /&gt;Extract ideas) --&amp;gt; fo1(Stage 1&lt;br /&gt;Research)
        fo1 --&amp;gt; fo2(Stage 2&lt;br /&gt;Structure)
        fo2 --&amp;gt; fo3(Stage 3&lt;br /&gt;Polish)
    end

    FO --&amp;gt; draft[Structured draft]
    draft --&amp;gt; RS

    subgraph RS [&quot;/review-steps&quot;]
        rs0(Stage 0&lt;br /&gt;Language) --&amp;gt; rs1(Stage 1&lt;br /&gt;Clarity)
        rs1 --&amp;gt; rs2(Stage 2&lt;br /&gt;Structure)
        rs2 --&amp;gt; rs3(Stage 3&lt;br /&gt;Consistency)
        rs3 --&amp;gt; rs4(Stage 4&lt;br /&gt;Best practice)
        rs4 --&amp;gt; rs5(Stage 5&lt;br /&gt;Tidy up)
        rs5 --&amp;gt; rs6(Stage 6&lt;br /&gt;Verify links)
    end

    RS --&amp;gt; reviewed[Reviewed draft]
    reviewed --&amp;gt; SE

    subgraph SE [&quot;/strong-edit&quot;]
        se0(Stage 0&lt;br /&gt;Core argument) --&amp;gt; se1(Stage 1&lt;br /&gt;Structure)
        se1 --&amp;gt; se2(Stage 2&lt;br /&gt;Relevance)
        se2 --&amp;gt; se3(Stage 3&lt;br /&gt;Challenge)
        se3 --&amp;gt; se4(Stage 4&lt;br /&gt;Readability)
        se4 --&amp;gt; se5(Stage 5&lt;br /&gt;Edits)
    end

    SE --&amp;gt; final[Final document]

    style raw fill:#f9f,stroke:#333
    style final fill:#9f9,stroke:#333
&lt;/pre&gt;

&lt;h2 id=&quot;writing-and-editing-in-stages&quot;&gt;Writing and editing in stages&lt;/h2&gt;

&lt;p&gt;What I’m trying to do is find the perfect division of labor. The human brings the problem and the engineering; the agent brings industry techniques, frameworks, and best practices. That allows you to focus your time on what is not in the model’s training data and leverage the agent for everything else.&lt;/p&gt;

&lt;p&gt;To do that, on top of breaking editing into three skills, each has a series of stages.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;flesh-out: from raw notes to structured content&lt;/strong&gt;: Takes a skeleton of ideas (bullet points, stream-of-consciousness notes, half-formed thoughts) and expands them into a structured document. The agent first confirms it understands the developer’s intent, then researches, structures, polishes, and tidies up. The risk here is meaning distortion. Raw notes are ambiguous, and assumptions compound. Stage 0 (extract core ideas, developer confirms) exists specifically to catch misunderstandings before they propagate.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;review-steps: polish and verify&lt;/strong&gt;: Takes a structured draft and improves it: language consistency, conceptual clarity, structural compliance, comparison against industry best practice, and link verification. The agent handles mechanical checks and research; the developer holds final authority on judgment calls. Stages 0-4 move progressively from minor editing to conceptual validation. Stage 5 tidies up references; Stage 6 verifies every URL resolves and that agent-sourced references support the claims made.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;strong-edit: challenge the content&lt;/strong&gt;: Takes a complete draft and critiques it: structure, relevance, argument strength, readability. Stages 0-4 are critique only; no edits are made to the document. The agent identifies weaknesses; the developer decides what matters. Edits happen in Stage 5 after the critique is agreed.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;try-it-out&quot;&gt;Try it out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Clone &lt;a href=&quot;https://github.com/nakane1chome/claude-skills&quot;&gt;https://github.com/nakane1chome/claude-skills&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Install to your home dir: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;./install.sh&lt;/code&gt; and follow the prompts&lt;/li&gt;
  &lt;li&gt;Already have a completed document? Try &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/review-steps&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/strong-edit&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Starting from scratch? Write a quick and dirty memo and try &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/flesh-out&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;The collection includes other skills; try &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/review-skill&lt;/code&gt; on one of your Claude Code skill files to review its structure and coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;caveats--risks&quot;&gt;Caveats &amp;amp; Risks&lt;/h2&gt;

&lt;p&gt;It’s not perfect yet; some of the issues with the flow are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;False sense of completeness&lt;/strong&gt;: A large language model (LLM) will confidently proclaim your work to be exceptional and ready to publish. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strong-edit&lt;/code&gt; skill exists partly to counter this: the philosophy section puts it directly: “Challenge the author’s assumptions. If something seems unclear, it may be unclear, or wrong. Don’t fix; question.”&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Hallucinations and trust&lt;/strong&gt;: The staged approach constrains this risk but does not eliminate it (see &lt;a href=&quot;#this-article&quot;&gt;This article&lt;/a&gt; for a concrete example from this very post). At every step, the source of information (human idea or agent inference) should be identifiable. The staged process keeps the human engaged rather than rubber-stamping, but link and fact verification remain the developer’s job. &lt;a href=&quot;https://stackoverflow.co/company/press/archive/stack-overflow-2025-developer-survey/&quot;&gt;Apparently developer trust in AI accuracy remains low&lt;/a&gt; (only 43% in recent surveys), for me the key here is checkpoints and grounding, which is done via staged editing.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Sensitivity to generated documents&lt;/strong&gt;: Some people will reject anything they believe was written by AI, regardless of how it was actually produced. These skills make the human the source of novel information and the agent an editor, but that distinction may not matter to everyone.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Speed&lt;/strong&gt;: It’s deliberately supervised, which can be slow.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;final-words&quot;&gt;Final Words&lt;/h2&gt;

&lt;p&gt;Raw ideas are ungrounded and cause confusion. Ground the &lt;strong&gt;idea&lt;/strong&gt; first. Use the agent to find and refine concrete ideas into written language before starting work.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;afterword-observations-from-practice&quot;&gt;Afterword: Observations from practice&lt;/h2&gt;

&lt;h3 id=&quot;this-post&quot;&gt;This post&lt;/h3&gt;

&lt;p&gt;It started as 727 words of raw notes with partial structure. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/flesh-out&lt;/code&gt; expanded that to 2279 words, tripling the length. The structure held but the agent added padding and hedging.&lt;/p&gt;

&lt;p&gt;Human editing cut about 12% (2279 to 2007 words), removing agent-generated filler. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/review-steps&lt;/code&gt; added the METR productivity trap reference, the developer trust data, and the afterword observations, bringing the word count back up to 2257.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/strong-edit&lt;/code&gt; sharpened the opening to lead with the thesis, tightened the leverage shift sentences, and replaced false binary framing. Final word count: ~1930.&lt;/p&gt;

&lt;p&gt;Then a link turned up 404. During &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/review-steps&lt;/code&gt;, the agent had added a reference with a fabricated URL. The domain didn’t exist. The link survived human editing and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/strong-edit&lt;/code&gt; because at every stage the reviewer was evaluating structure, argument, and voice — nobody was clicking links. Stage 6 (verify links) was added to the review-steps skill as a direct result of this failure.&lt;/p&gt;

&lt;p&gt;Rough attribution of the final post:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Human&lt;/th&gt;
      &lt;th&gt;Agent&lt;/th&gt;
      &lt;th&gt;Co-created&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Ideas and arguments&lt;/td&gt;
      &lt;td&gt;~95%&lt;/td&gt;
      &lt;td&gt;~5%&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Research and references&lt;/td&gt;
      &lt;td&gt;~50%&lt;/td&gt;
      &lt;td&gt;~50%&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Prose (the actual sentences)&lt;/td&gt;
      &lt;td&gt;~5%&lt;/td&gt;
      &lt;td&gt;~60%&lt;/td&gt;
      &lt;td&gt;~35%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Editorial decisions&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
      &lt;td&gt;0%&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;in-general&quot;&gt;In general&lt;/h3&gt;

&lt;p&gt;Applied mostly to design and architecture documents, working with &lt;a href=&quot;https://docs.anthropic.com/en/docs/claude-code&quot;&gt;Claude Code&lt;/a&gt; Opus 4.5 and skills-as-files:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Scaffolding precision&lt;/strong&gt;: Claude sticks precisely to the staged workflow when actions are defined as skills. This was not the case when just adding project guidelines to CLAUDE.md or &lt;a href=&quot;https://agents.md/&quot;&gt;AGENTS.md&lt;/a&gt;; in both cases the agent did not consistently apply the scaffold.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Mode detection&lt;/strong&gt;: The agent identifies when a review is not appropriate — e.g., when a largely complete piece would benefit more from strong-edit than review-steps.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Term sensitivity&lt;/strong&gt;: “Review” signals preservation; “flesh-out” signals generation; “strong-edit” signals critique. The difference in vocabulary shapes the agent’s behavior significantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;standout-steps&quot;&gt;Standout steps&lt;/h3&gt;

&lt;h4 id=&quot;review-vs-industry-best-practice&quot;&gt;Review vs industry best practice&lt;/h4&gt;

&lt;p&gt;Stage 4 of review-steps sends the agent to search for relevant frameworks and approaches in the domain, then compares the document against what it finds. Not a substitute for expert review; it’s an automated search with a structured summary. But you won’t be caught off-guard by an obvious gap that a five-minute search would have revealed.&lt;/p&gt;

&lt;p&gt;This stage tends to pull in a lot of information. I ask the agent to generate a separate report and keep it out of the document under review. Without that constraint, the agent folds unnecessary details from the research into the document.&lt;/p&gt;

&lt;h4 id=&quot;stage-0-extract-idea-flesh-out-and-core-argument-strong-edit&quot;&gt;Stage 0: Extract Idea (flesh-out) and Core Argument (strong-edit)&lt;/h4&gt;

&lt;p&gt;As a rough guide, the agent gets around 70% of these stages correct and then asks questions to clarify the remainder. It produces a concise summary of what it understands and what needs clarification, and this is very insightful to interact with.&lt;/p&gt;

&lt;h4 id=&quot;flesh-out-catching-a-wrong-assumption&quot;&gt;Flesh-out catching a wrong assumption&lt;/h4&gt;

&lt;p&gt;On a separate project, I wrote 169 words of raw notes for a CPU model design. Key assumption: “Only the kernel uses the ISA directly in supervisor space.”&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/flesh-out&lt;/code&gt; expanded to a 1962-word design document (11.6x). On a second pass, the agent researched the actual source tree and found 21+ user-space assembly files across 10 library modules and 3 applications. The assumption was wrong: the ISA boundary was a language boundary, not a privilege boundary.&lt;/p&gt;

&lt;p&gt;This changed the design. The context diagram was rewritten from a two-path flow to a 2x2 matrix (kernel/user x assembler/pascal). A fundamental misunderstanding was caught before it shaped the implementation.&lt;/p&gt;

&lt;h2 id=&quot;see-also&quot;&gt;See also&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Parallel approach&lt;/strong&gt;: &lt;a href=&quot;https://github.com/robertguss/claude-skills&quot;&gt;Robert Guss’s “Book Factory”&lt;/a&gt;, found during the research editing stage, takes a similar pipeline approach for nonfiction book production, replicating traditional publishing infrastructure with Claude Code skills.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Idea dictation&lt;/strong&gt;: &lt;a href=&quot;https://wispr.ai&quot;&gt;Wispr&lt;/a&gt; (covered in &lt;a href=&quot;https://se-radio.net/2026/01/se-radio-703-sahaj-garg-on-low-latency-ai/&quot;&gt;SE Radio 703&lt;/a&gt;) is working on something more like &lt;em&gt;idea&lt;/em&gt; dictation, using LLMs to edit raw verbal input into structured text.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Product-engineering handoff&lt;/strong&gt;: This post covers a single IC workflow. &lt;a href=&quot;https://www.bicameral-ai.com/blog/introducing-bicameral&quot;&gt;Bicameral AI&lt;/a&gt; focuses on the product-engineering handoff, surfacing how proposed features impact existing architectures and detecting requirement gaps before reaching the coding stage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;future-extension&quot;&gt;Future Extension&lt;/h2&gt;

&lt;p&gt;The review process produces a specification with a known complexity profile. That profile could select the agent: a high-end model for architectural work, a lightweight model for boilerplate. Model selection as a downstream output of document review, not a separate decision.&lt;/p&gt;

&lt;h2 id=&quot;review-reports&quot;&gt;Review reports&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/agentic-code/2026/02/15/claude-skill-document-review-stage4-report.html&quot;&gt;Stage 4 Research Report: Review vs Industry Best Practice&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/agentic-code/2026/02/15/claude-skill-document-review-strong-edit-report.html&quot;&gt;Strong-Edit Report&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate>
        <link>https://www.shincbm.com/agentic-code/2026/02/15/claude-skill-document-review.html</link>
        <guid isPermaLink="true">https://www.shincbm.com/agentic-code/2026/02/15/claude-skill-document-review.html</guid>
        
        <category>claude-code</category>
        
        <category>claude-skills</category>
        
        <category>agentic-coding</category>
        
        <category>documentation</category>
        
        <category>docs-as-code</category>
        
        
        <category>agentic-code</category>
        
      </item>
      
    
      
    
      
    
      
      <item>
        <title>OpenCL Learning Exercise — Image Transform</title>
        <description>&lt;p&gt;This is an old tool written to play with &lt;a href=&quot;https://www.khronos.org/opencl/&quot;&gt;OpenCL&lt;/a&gt; (Open Computing Language) on x86_64 and aarch64
machines.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/nakane1chome/opencl-learn/tree/master&quot;&gt;https://github.com/nakane1chome/opencl-learn/tree/master&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ImageXform is a command-line tool for learning and experimenting with
OpenCL by applying GPU-accelerated kernels to images. It provides a
visual, hands-on way to understand OpenCL programming by
transforming PNG or JPG images through custom or example kernels.&lt;/p&gt;

&lt;p&gt;The tool handles all the OpenCL and image conversion boilerplate
(device selection, context creation, buffer management) so the kernel
can be launched and results inspected.&lt;/p&gt;

&lt;p&gt;It is designed for learning OpenCL concepts like parallel processing, memory
management, and kernel optimization through immediate visual
feedback.&lt;/p&gt;

&lt;p&gt;Code and build system have been slightly modernized after fixing a few bugs.&lt;/p&gt;

&lt;h2 id=&quot;example-usage&quot;&gt;Example Usage&lt;/h2&gt;

&lt;p&gt;Transform an image to grayscale:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Build the project (one-time setup)&lt;/span&gt;
make build-amd64

&lt;span class=&quot;c&quot;&gt;# Run grayscale conversion&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;build.amd64/imgxform
./imagexform &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; grayscale.cl &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; test_in.png &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; output_gray.png

&lt;span class=&quot;c&quot;&gt;# The tool will:&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# 1. Load test_in.png&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# 2. Compile grayscale.cl kernel&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# 3. Execute it on your OpenCL device (GPU/CPU)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# 4. Save the result to output_gray.png&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can substitute any &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.cl&lt;/code&gt; kernel file to experiment with different transformations like edge detection (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;edge_3x3.cl&lt;/code&gt;) or create your own custom kernels.&lt;/p&gt;

</description>
        <pubDate>Sun, 11 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://www.shincbm.com/restore_old_code/2026/01/11/opencl-test-prog.html</link>
        <guid isPermaLink="true">https://www.shincbm.com/restore_old_code/2026/01/11/opencl-test-prog.html</guid>
        
        <category>opencl</category>
        
        <category>C++</category>
        
        <category>gpu</category>
        
        
        <category>restore_old_code</category>
        
      </item>
      
    
      
    
  </channel>
</rss>
