The Problem With Building Things
Every developer knows the feeling. You open a new project, stare at a blank directory, and start making decisions. Framework here, folder structure there, a service layer because it feels right. Six weeks later you have something that works but that no one — including you — can fully explain. The shape emerged from a series of local decisions that made sense at the time.
Forge was built to fix this. Not by enforcing a rigid structure, but by making structure visible — declaration graphs, manifest nodes, parity checks, contract enforcement. The codebase as a machine with known tolerances.
But there was an irony we kept running into: Forge itself was subject to the same drift it was designed to prevent. New features got added, MCP tools got exposed before their underlying CLI was audited, commands accumulated technical debt while the graph stayed nominally clean.
This post is about the session where we addressed that — and what we built along the way.
The Insight: Blueprint Before Bricks
The trigger was a practical constraint. Cursor token budget running low, but a workout app to build. The question: do you actually need Cursor to slot files if the spec is precise enough that an AI can generate them one at a time?
The answer was no. And that answer unlocked everything.
If a node spec is complete enough to generate from, it’s also complete enough to verify, document, and reason about. The bottleneck was never file creation — it was specification clarity. Cursor (or any agentic tool) was spending half its budget going to the grocery store before it could cook. What was needed was a system that delivered exactly the right ingredients, pre-measured, to a prepared kitchen.
This is the mise en place principle applied to software. A chef whose station is already set just executes. The cognitive load of “what do I need?” is separated from “produce the thing.”
What Got Built
Stack Gloves
The first artifact: .forge/gloves/flutter_supabase.glove.yaml. A
machine-readable schema encoding every architectural convention for the
Flutter/Supabase stack — not as prose documentation, but as
enforceable contracts.
contracts:
supabase_raw_map:
rule: SupabaseService only touches raw Map, never typed models
verifiable: true
verify_command: "forge verify --contract supabase_raw_map"
screen_router_only:
rule: screens contain no business logic, only routing to variants
adr: ADR-019
verifiable: true
The key distinction: contracts are either verifiable: true (enforced by tooling) or guidance (labelled, human-read, ignored by tools). Nothing lives between these. A rule that sometimes applies is not a rule.
Node Manifests and Generation Manifests
Every node that will exist gets a spec before any code is written:
node: WorkoutService
type: service
domain: WORKOUT
stack_glove: flutter_supabase
interface:
methods:
- fetchExercises() → List<Exercise>
- saveSession(session: WorkoutSession) → void
contracts: [supabase_raw_map, structured_logging]
outputs:
- lib/workout/services/workout_service.dart
- test/workout/services/workout_service_test.dart
wiring:
- file: lib/workout/workout_module.dart
pattern: singleton_registration
From this spec, a generation manifest is derived — the “cut sheet” handed to the CNC machine. The AI reads the cut sheet, generates exactly two files, touches exactly one wiring file. It never opens the rest of the codebase.
Context Packets
The planning/execution separation formalised into a first-class artifact. A context packet is assembled in a planning session and consumed in an execution session:
type: generate
node: WorkoutService
domain: WORKOUT
freshness:
assembled_at: 2026-04-25T10:00:00Z
graph_hash: abc123
expires_after: 72h
review:
assembled_by: claude-planning-session
reviewed_by: null # BLOCKS execution until signed
If reviewed_by is null, forge verify --packet blocks execution. The human must read the packet and sign off. The act of filling in the field is the review.
Cross-Graph Diff: Mechanical Progress Tracking
The insight that killed status tracking entirely. Progress is just:
nodes_planned - nodes_present = remaining work
nodes_present - nodes_connected = wiring gaps
nodes_connected ∩ violations = quality gaps
forge verify --diff plan --domain WORKOUT
WORKOUT (16 planned, 3 present, 13 remaining)
✓ Exercise model
✓ WorkoutSession model
✓ WorkoutService service
✗ SessionService service ← next
✗ WorkoutController controller
...
No status log. No “mark as complete.” The file existing and being mapped is the completion signal. The diff closing is the receipt.
The chef glances at the board, sees what’s left, proceeds.
Forgefit
The first real consumer of the scaffold system. A workout tracking app with a full app manifest, 44 planned nodes across 7 domains, a complete connection graph, and an earn events registry — all before a single line of Flutter exists.
app:
name: Forgefit
stack: flutter_supabase
earn_events_registry:
- {event: WORKOUT_COMPLETED, sparks: 50}
- {event: PERSONAL_RECORD, sparks: 100}
- {event: STREAK_7_DAY, sparks: 150}
When Cursor tokens refresh, the kitchen is already set. First move:
forge verify --diff plan --domain WORKOUT
The first ✗ in the output is the first packet to assemble.
The Recursive Part
The scaffold system was itself built using the scaffold system’s principles. Node specs before code. Connections verified before generation. Each new file registered in the manifest before the next was touched.
This is now P-001 in docs/PRINCIPLES.md:
Any sufficiently important Forge capability must be buildable using Forge itself. If you can’t spec it, scaffold it, verify it, and map it — the tooling is missing something.
The corollary that matters most in practice:
If automated tooling is unavailable — token limits, broken environment, no Cursor — a human should be able to execute any Forge workflow step-by-step using only a node manifest, a stack glove, and an AI that can generate one file at a time. If this breaks down, the spec is incomplete.
The CLI/MCP Audit
Alongside the scaffold system, the session included a full audit of Forge’s CLI and MCP surface — something that had been postponed across several phases.
The principle: don’t expose a stale command as an MCP tool. That just gives the agent a faster way to get wrong answers.
The audit covered 27 commands. Findings:
- 3 commands confirmed deprecated with hardcoded notices (context, dashboard, health) — marked in the canonical surface doc
- 8 commands confirmed clean and MCP-exposed correctly
- 3 commands (verify, audit, scaffold) marked for refactor before further exposure — raw YAML paths, fat command surfaces, mixed ManifestParser usage
- 7 human-facing commands queued for uplift — response_tier wiring, structured output, ManifestParser parity
- 4 new modules (verify/diff, verify/preflight, verify/packet_validator, scaffold/spec_mode) registered in the manifest graph
The verify command’s new modes landed cleanly:
forge verify --diff plan # what's left to build
forge verify --diff wiring # what's built but unwired
forge verify --diff contracts # what's wired but wrong
forge verify --packet <path> # is this packet safe to execute
forge verify --preflight # is the plan coherent before generation
Post-audit: 262 tests passing, one fixture gap in the express scaffold test (glove file not seeded in temp project — a one-line fix, not a logic error).
What This Changes
Before this session, building a new app with Forge meant: orient → read context → understand structure → start building → drift into inconsistency.
After: describe app → global manifest → domain decomposition → preflight verify → generate node by node → slot files → map after each → diff to see what’s left → repeat until diff is empty.
The grocery store is permanently closed during cooking hours.
The next build will prove whether the kitchen actually works.