Claude Beyond Code: Mastering Documentation with Review & Editing Skills
Productive developers have always resisted documentation: it interrupts the flow of getting code out of your head and into the editor. Agentic coding flips this. When an agent writes the code, the bottleneck moves upstream: how clearly can you express the idea? Documentation isn’t the interruption anymore. It’s the leverage.
This post introduces a set of skills for agentic document elicitation and review. The skills are reusable instruction files for Claude Code, covering document creation and review. First, why you’d want them.
The leverage shift
The traditional developer’s fulcrum is mental clarity: keep the system idea in your head, design and refine in place. The lever moves well-formed code from brain to editor at speed.
Documentation was a blocker, at best a necessary evil. Interrupting that flow to “document it” was disruptive. You did it because collaboration required it, not because it helped you think.
There is a trap here. A productive developer who tries to slot an agent into their existing mental-clarity workflow (keeping the idea in their head and using the agent as a faster typist) will likely lose productivity. The leverage doesn’t come from adding an agent to the old workflow. It comes from changing the workflow. The loss comes from keeping the idea private while outsourcing only typing. The gain comes from making the idea inspectable.
The agentic developer’s fulcrum is agentic clarity: externalize the idea so completely that the lever moves code from agent to codebase without pulling a human back into the loop.
Documentation is now an enabler, or at least that’s the argument I’ll make. When an agent handles the code, human language becomes the input, not the afterthought. Iterative prompting works for contained tasks, but anything that persists (an API consumed by multiple agents, a design revisited across sprints) benefits from a document under configuration management rather than a conversation that evaporates.
Plan mode in agentic coding tools is one path, but plans are one-off. The development lifecycle needs the same ideas surfaced repeatedly. A living document in the workspace gives both agents and humans a shared, versioned reference.
A Structured Review Skill
This approach relies on docs as code: maintaining documentation as an artifact alongside source code. You can port the skills to a different workflow, but in my opinion that reduces leverage and adds friction.
At every stage of the development lifecycle, ask: what is the source of information? Is it the developer’s novel idea, or can the agent infer it from standard approaches? With three skills I form a pipeline to capture the relevant information and filter the noise:
| Skill | Action | Information | Noise | Refinement |
|---|---|---|---|---|
| flesh-out | Generate | Expand | Adds (controlled) | Raw ore → shaped material |
| review-steps | Polish | Preserve | Reduce | Remove impurities |
| strong-edit | Critique | Enhance | Reduce | Stress-test the structure |
A typical pipeline runs flesh-out → review-steps → strong-edit, but the skills can be applied in any order; each one detects what the document needs. All three are broken into numbered stages with developer checkpoints. The agent stops after each stage and waits for approval before proceeding.
graph LR
raw[Raw notes] --> FO
subgraph FO ["/flesh-out"]
fo0(Stage 0
Extract ideas) --> fo1(Stage 1
Research)
fo1 --> fo2(Stage 2
Structure)
fo2 --> fo3(Stage 3
Polish)
end
FO --> draft[Structured draft]
draft --> RS
subgraph RS ["/review-steps"]
rs0(Stage 0
Language) --> rs1(Stage 1
Clarity)
rs1 --> rs2(Stage 2
Structure)
rs2 --> rs3(Stage 3
Consistency)
rs3 --> rs4(Stage 4
Best practice)
rs4 --> rs5(Stage 5
Tidy up)
rs5 --> rs6(Stage 6
Verify links)
end
RS --> reviewed[Reviewed draft]
reviewed --> SE
subgraph SE ["/strong-edit"]
se0(Stage 0
Core argument) --> se1(Stage 1
Structure)
se1 --> se2(Stage 2
Relevance)
se2 --> se3(Stage 3
Challenge)
se3 --> se4(Stage 4
Readability)
se4 --> se5(Stage 5
Edits)
end
SE --> final[Final document]
style raw fill:#f9f,stroke:#333
style final fill:#9f9,stroke:#333
Writing and editing in stages
The division of labor is consistent: the human brings the problem and the engineering; the agent brings industry techniques, frameworks, and best practices. Focus your time on what is not in the model’s training data. Leverage the agent for everything else.
-
flesh-out: from raw notes to structured content: Takes a skeleton of ideas (bullet points, stream-of-consciousness notes, half-formed thoughts) and expands them into a structured document. The agent first confirms it understands the developer’s intent, then researches, structures, polishes, and tidies up. The risk here is meaning distortion. Raw notes are ambiguous, and assumptions compound. Stage 0 (extract core ideas, developer confirms) exists specifically to catch misunderstandings before they propagate.
-
review-steps: polish and verify: Takes a structured draft and improves it: language consistency, conceptual clarity, structural compliance, comparison against industry best practice, and link verification. The agent handles mechanical checks and research; the developer holds final authority on judgment calls. Stages 0-4 move progressively from minor editing to conceptual validation. Stage 5 tidies up references; Stage 6 verifies every URL resolves and that agent-sourced references support the claims made.
-
strong-edit: challenge the content: Takes a complete draft and critiques it: structure, relevance, argument strength, readability. Stages 0–4 are critique only; no edits are made to the document. The agent identifies weaknesses; the developer decides what matters. Edits happen in Stage 5 after the critique is agreed.
Try it out
- Clone https://github.com/nakane1chome/claude-skills
- Install to your home dir:
./install.shand follow the prompts - Already have a completed document? Try
/review-stepsor/strong-edit - Starting from scratch? Write a quick and dirty memo and try
/flesh-out - The collection includes other skills; try
/review-skillon one of your Claude Code skill files to review its structure and coverage
Caveats & Risks
-
False sense of completeness: A large language model (LLM) will confidently proclaim your work to be exceptional and ready to publish. The
strong-editskill exists partly to counter this: the philosophy section puts it directly: “Challenge the author’s assumptions. If something seems unclear, it may be unclear, or wrong. Don’t fix; question.” -
Hallucinations and trust: Developer trust in AI accuracy remains low (only 43% in recent surveys) and with reason. The staged approach constrains this risk but does not eliminate it (see This article for a concrete example from this very post). At every step, the source of information (human idea or agent inference) should be identifiable. The staged process keeps the human engaged rather than rubber-stamping, but link and fact verification remain the developer’s job.
-
Sensitivity to generated documents: Some people will reject anything they believe was written by AI, regardless of how it was actually produced. These skills make the human the source of novel information and the agent an editor, but that distinction may not matter to everyone.
Final Words
Japanese business culture has a term for meetings where people talk past each other without a shared grounding: 空中戦 (kūchūsen, “air battle”). Opinions fly overhead like fighter jets; there’s a lot of noise, nobody lands a hit, and the meeting ends with no decisions. The antonym is 地上戦 (chijōsen, “ground battle”); discussion grounded in visible, concrete materials where participants work through specifics together.
Raw ideas create kūchūsen. An air battle with an agent is only going to slow you down. Ground the idea first, and let the agent do what it’s good at.
Afterword: Observations from practice
This post
It started as 727 words of raw notes with partial structure. /flesh-out expanded that to 2279 words, tripling the length. The structure held but the agent added padding and hedging.
Human editing cut about 12% (2279 to 2007 words), removing agent-generated filler. /review-steps added the METR productivity trap reference, the developer trust data, and the afterword observations, bringing the word count back up to 2257.
/strong-edit sharpened the opening to lead with the thesis, tightened the leverage shift sentences, and replaced false binary framing. Final word count: ~1930.
Then a link turned up 404. During /review-steps, the agent had added a reference with a fabricated URL. The domain didn’t exist. The link survived human editing and /strong-edit because at every stage the reviewer was evaluating structure, argument, and voice — nobody was clicking links. Stage 6 (verify links) was added to the review-steps skill as a direct result of this failure.
Rough attribution of the final post:
| Human | Agent | Co-created | |
|---|---|---|---|
| Ideas and arguments | ~95% | ~5% | — |
| Research and references | ~50% | ~50% | — |
| Prose (the actual sentences) | ~5% | ~60% | ~35% |
| Editorial decisions | 100% | 0% | — |
In general
Applied mostly to design and architecture documents, working with Claude Code Opus 4.5 and skills-as-files:
- Scaffolding precision: Claude sticks precisely to the staged workflow when actions are defined as skills. This was not the case when just adding project guidelines to CLAUDE.md or AGENTS.md; in both cases the agent did not consistently apply the scaffold.
- Mode detection: The agent identifies when a review is not appropriate — e.g., when a largely complete piece would benefit more from strong-edit than review-steps.
- Term sensitivity: “Review” signals preservation; “flesh-out” signals generation; “strong-edit” signals critique. The difference in vocabulary shapes the agent’s behavior significantly.
Standout steps
Review vs industry best practice
Stage 4 of review-steps sends the agent to search for relevant frameworks and approaches in the domain, then compares the document against what it finds. Not a substitute for expert review; it’s an automated search with a structured summary. But you won’t be caught off-guard by an obvious gap that a five-minute search would have revealed.
This stage tends to pull in a lot of information. I ask the agent to generate a separate report and keep it out of the document under review. Without that constraint, the agent folds unnecessary details from the research into the document.
Stage 0: Extract Idea (flesh-out) and Core Argument (strong-edit)
As a rough guide, the agent gets around 70% of these stages correct and then asks questions to clarify the remainder. It produces a concise summary of what it understands and what needs clarification, and this is very insightful to interact with.
Flesh-out catching a wrong assumption
On a separate project, I wrote 169 words of raw notes for a CPU model design. Key assumption: “Only the kernel uses the ISA directly in supervisor space.”
/flesh-out expanded to a 1962-word design document (11.6x). On a second pass, the agent researched the actual source tree and found 21+ user-space assembly files across 10 library modules and 3 applications. The assumption was wrong: the ISA boundary was a language boundary, not a privilege boundary.
This changed the design. The context diagram was rewritten from a two-path flow to a 2x2 matrix (kernel/user x assembler/pascal). A fundamental misunderstanding was caught before it shaped the implementation.
See also
- Diataxis: The Diataxis framework classifies documentation into four forms: tutorials, how-to guides, reference, and explanation. The skills pipeline here is about content quality regardless of type; Diataxis is about choosing the right type. They’re complementary.
- Parallel approach: Robert Guss’s “Book Factory” takes a similar pipeline approach for nonfiction book production, replicating traditional publishing infrastructure with Claude Code skills.
- Idea dictation: Wispr (covered in SE Radio 703) is working on something more like idea dictation, using LLMs to edit raw verbal input into structured text.
- Product-engineering handoff: This post covers a single IC workflow. Bicameral AI focuses on the product-engineering handoff, surfacing how proposed features impact existing architectures and detecting requirement gaps before reaching the coding stage.
Future Extension
The review process produces a specification with a known complexity profile. That profile could select the agent: a high-end model for architectural work, a lightweight model for boilerplate. Model selection as a downstream output of document review, not a separate decision.