Field Manual / Zero → Production, Solo

The One-Person Army Field Manual

How one engineer, with an AI agent team, builds a startup from zero to production — then maintains, scales, and extends it.

I don't out-type a team. I out-architect one. The leverage isn't the model — it's the spec I write, the context I feed it, and the verification I design around it. That's the senior craft I already had; the agents just made it the whole job. My practice runs as one closed flywheel across the product arc: user signal becomes a spec, the spec becomes an agent-accelerated build, the build ships only when a check I can run proves it, and the instrumented outcome becomes the next signal.

The thing that substitutes for the team I don't have is the harness — the layered structure of memory, tools, permissions, hooks, and observability that wraps the model. Memory replaces the onboarding doc. Tools replace the DevOps integrations. Permissions replace the security reviewer. Hooks replace the CI gate. Observability replaces the SRE. I maintain it with the same discipline as application code, and on this very repo that means a real CLAUDE.md, AGENTS.md, and .claude/rules/ I tune as the work moves.

It's honest about its limits, and that honesty is a feature. Judgment doesn't delegate. Direct customer contact doesn't delegate. Managed services and human-in-the-loop checkpoints aren't concessions I make to the model — they're load-bearing parts of the architecture. I automate the operational loop and stay close to the customer loop, because the operator who does both beats both the slower team and the judgment-free vibe-coder.

Zero to self-operating.

( The arc )

/5
Zero
Zero → Validate
Before any feature code exists, lay the harness memory, fan out parallel read-only discovery to scope the problem cheaply, and narrow the build to the single riskiest assumption that could kill the product. At this phase my irreplaceable contribution is the spec, not the code — the agent is the spec's execution surface.
1. Harness memory before features
  I write the project constitution first — CLAUDE.md / AGENTS.md with architecture conventions, coding standards, deployment procedure, agent-permission boundaries, and the non-obvious gotchas. That's the onboarding doc a team gets for free; solo, it's the only thing that keeps agent output coherent across context resets, because most agent failures are context failures, not model failures. I prune it ruthlessly so the important instructions don't drown in a kitchen-sink of noise, and I check it into git so the harness evolves with the codebase.
  Harness engineering
  AI agentic engineering
  Senior SWE craft
2. Fan-out discovery before feature code exists
  Before committing to a build I run discovery in read-only mode: a lead decomposes the question and a handful of parallel subagents explore the problem space — library surveys, codebase reconnaissance, API and integration constraints, alternative architectures — each returning a condensed summary I synthesize into a recommendation. This is the parallel-discovery staffing cost a team would split across people, and it's exactly the read-heavy work where multiple fresh contexts beat one. No writes happen here, so the single-writer law is never touched; the output is decision-quality input to the spec. The mechanics live on my /workflows page as fan-out-and-synthesize.
  Multi-agent orchestration
  AI agentic engineering
  Product engineering
3. Interview-to-spec to surface the unknowns
  I run an interview pass that probes implementation, UI/UX, edge cases, and tradeoffs until the hard parts are covered, then write a spec that names the files and interfaces in play and what is explicitly out of scope. The interview routinely surfaces unknowns that would otherwise be discovered expensively mid-build. The spec becomes the shared source of truth that survives context resets and agent handoffs — the artifact that lets one person hold product coherence without a PM or an architect in the room.
  AI agentic engineering
  Senior SWE craft
  Product engineering
4. Scope the MVP to the single riskiest assumption
  Spec quality is the binding constraint, not model capability — so I subtract first. I find the one hypothesis that, if wrong, kills the product, and build only enough to falsify it against a real-user threshold instead of a vanity launch. Cheap agent-generated code biases toward building more; the discipline is to build less. Every task queued past the riskiest-assumption boundary is premature work that multiplies maintenance and instrumentation noise later.
  Product engineering
  Senior SWE craft
  AI agentic engineering
5. Choose managed services to delete whole domains
  I default to managed and serverless platforms — hosting, database and auth, payments, compute — to transfer the infrastructure ownership surface to vendors for a fixed subscription. This isn't cost-cutting; it deletes the second full-time job of being a systems administrator and frees bandwidth for product and agent orchestration. At zero-to-one I deliberately over-index on managed services and bring infrastructure in-house only when a concrete measured constraint forces it. Every self-hosted service is a future on-call incident with no on-call team.
  DevOps/SRE
  Product engineering
  Senior SWE craft
/5
One
One → Build & Ship
Turn the spec into shipped code: partition work into core (human-supervised) and non-core (agent-delegated), parallelize only where the work is genuinely independent, and keep every write single-threaded so additional agents add intelligence, not conflicting actions. Senior craft here is decomposition into parallelizable subtasks with unambiguous acceptance criteria.
1. Explore → Plan → Implement → Commit
  I separate reading from writing. Plan mode is read-only — understand current state, produce an editable plan — then execution runs against the plan with a runnable check in the same prompt, then commit. I skip the plan only for one-sentence changes. This is how a solo operator avoids the most expensive failure mode — code that compiles but solves the wrong problem — without a second engineer to catch it.
  AI agentic engineering
  Harness engineering
  Senior SWE craft
2. Core / non-core partition for delegation
  I label every task. Core work — domain logic, auth boundaries, payment flows, onchain writes — gets AI as a supervised co-pilot with human review of every diff. Non-core work — scaffolding, boilerplate, standardized infrastructure — gets fully delegated, gated only by CI. When rework shows up at the boundary, that's the signal a non-core task was actually core and needs reclassifying. On Limitless I held the line hard at the core edge: market-creation flows touched smart contracts and Gnosis Safe multisig, so those got human eyes on every step.
  Senior SWE craft
  AI agentic engineering
  Multi-agent orchestration
3. Single-threaded writes, fan-out intelligence
  The durable law of my multi-agent work: code-writing stays on one agent because conflicting concurrent decisions produce inconsistent implementations, while read-heavy exploration and review fan out to parallel agents whose fresh context is an asset. I run parallel workspaces through Multica AI and Conductor, but the writer is always one agent at a time. Concurrent writers produce the canonical failure where a subagent without full context builds the wrong thing into the wrong place.
  Multi-agent orchestration
  AI agentic engineering
  Harness engineering
4. Tool / ACI design as a first-class surface
  When I build MCP servers and agent tooling I treat tool descriptions as public API contracts — namespace by service and resource, return semantic identifiers not opaque UUIDs, give actionable error guidance, and add explicit when-to-use disambiguation between similar tools. Bad descriptions silently degrade agents in ways that look like model failures, so this is real leverage. On the AltLayer MCP platform I lived this: architecting ~37 focused protocol MCP servers — each with a small, cohesive per-protocol tool surface — with guided tool-calling via schema-level examples and workflow hints, so agents could reliably select the right tool without fabricating arguments.
  AI agentic engineering
  Harness engineering
  Senior SWE craft
5. Flag-first feature lifecycle
  The model is that every feature — human-written or agent-generated — ships behind a flag from the first commit, with agents prompted to generate flag-gated code as a constraint. The flag is a kill switch, a rollout control, and an experiment-assignment mechanism at once. It's the interface between the agentic build loop and the product experiment loop, and it's a growth motion one person can run with just a flag system and an analytics tool.
  Product engineering
  DevOps/SRE
  AI agentic engineering
/5
Production
Production → Verify & Observe
Close the loop between code generation and reliable operation: give every task a check the agent can run, enforce quality mechanically through hooks rather than advisory instructions, and stand up day-zero observability so the elevated defect risk in agent-assisted code surfaces as alerts, not as user-reported failures.
1. Give the agent a check it can run
  I always hand the agent a runnable verification in the same prompt — test suite, build exit code, linter, or a live Playwright / Synpress E2E run — and escalate from in-prompt checks to Stop hooks to fresh-context adversarial review. Without a runnable check, 'looks done' is the only signal and I become the verification loop. I scope the adversarial reviewer to gaps that affect correctness or stated requirements, not a blanket 'find gaps,' which manufactures false positives even on sound work. The pattern mechanics are on /workflows as adversarial verification.
  AI agentic engineering
  Harness engineering
  Senior SWE craft
2. Hook-based mechanical enforcement
  I shift quality enforcement off prompt instructions an agent can rationalize around and onto hooks that operate outside the model's context window: block destructive ops and .env writes, run lint and type-check after every edit, block session end until tests pass, and role-fence tool access so orchestrators get Task not Edit and implementors get Edit not recursive Task. A hook that blocks an operation is a guarantee; a line in CLAUDE.md competes with the whole context and loses under pressure. Hooks are my CI gate and security reviewer, and they're configured in this repo's own .claude setup.
  Harness engineering
  DevOps/SRE
  Senior SWE craft
3. Cross-model adversarial verification
  I assign generation and verification to different models so they don't share correlated blind spots and echo-validate each other's errors. Claude Code is my primary coding agent; Codex is the independent second opinion and review pass. Same-model self-review is a performance of quality, not quality — a different model lineage adds real epistemic diversity. This producer/verifier split is a daily part of how I work, not a special-occasion step.
  Multi-agent orchestration
  AI agentic engineering
  Senior SWE craft
4. SLOs and error budgets as the velocity circuit breaker
  The discipline is to pick one user-critical path, instrument it, set the SLO by rounding down observed performance over a baseline window, and track an error budget over a rolling window. While budget remains, you ship; when it exhausts, reliability work preempts features. 100% reliability is the wrong target — the marginal nines cost disproportionate effort for negligible user value. For an operator generating code at high velocity, the error budget is the circuit breaker that stops output from outpacing system stability.
  DevOps/SRE
  Senior SWE craft
  AI agentic engineering
5. Instrument activation and retention from day zero
  Alongside the first production deploy, the two things to instrument first are activation (first experience of core value) and week-one retention — everything else is downstream. Wire vendor-neutral instrumentation once so the backend stays a cost decision, not a lock-in, and point the same conventions at the agent harness itself so the most failure-prone layer is observable. A harness that ships fast but ships blind is a liability.
  Product engineering
  DevOps/SRE
  Harness engineering
/5
Scale
Scale → Govern cost, velocity, memory
Sustain quality while controlling token budget and protecting reliability: route tasks to the right model tier, let the error budget govern release velocity, externalize memory to survive context rot, and run the experiment loop relentlessly with the expectation that most experiments fail.
1. Model-tier routing / plan-execute split
  I route by reasoning depth instead of defaulting to the strongest model: a stronger model for orchestration and architectural decisions, a mid-tier model for parallel implementation workers and verification passes, a cheap fast model for high-frequency classification and extraction. Splitting plan (expensive, architectural) from execute (cheap, mechanical) is what makes multi-agent orchestration affordable at volume. My daily stack reflects this — Claude Code on the Max plan for the heavy work, OpenCode Go on a lightweight plan for quick tasks.
  Multi-agent orchestration
  AI agentic engineering
  DevOps/SRE
2. DORA four-keys as a personal dashboard
  The move is to track deployment frequency, change lead time, change failure rate, and recovery time as a personal scoreboard from CI/CD and GitHub events. Deploy-on-merge alone puts a solo operator in the elite band on frequency and lead time; the actionable part is the inverse — when change-failure rate or recovery time drifts, you stop shipping features and invest in test coverage or runbook quality. Speed and stability aren't a tradeoff, and the error budget plus a DORA view together are how one person runs the SRE function.
  DevOps/SRE
  Senior SWE craft
  Product engineering
3. Externalized memory for long-horizon continuity
  I treat the context window as a finite resource: compact near saturation while preserving architectural decisions, use structured note files (a progress log, a decisions log) for long-horizon tasks where file-based memory beats compaction, and isolate research in subagents that return condensed summaries. I gate writes deliberately — agents commit verified findings at checkpoints, not raw intermediate output — so one agent's hallucination never becomes another's authoritative input. Obsidian and Notion sit underneath this as my durable knowledge base across sessions.
  Harness engineering
  Multi-agent orchestration
  AI agentic engineering
4. Hypothesis-driven experiment loop
  The loop is continuous experiments with a defined anatomy — what change, why, baseline, metric, expected direction — and licensing 'dumb' experiments because the minority of wins pays for all the failures. Each agent-built variant carries its hypothesis in context so it can validate its own output against intent. The defensible moat is never the stack, which AI makes cheap to build — it's the speed and quality of the user-signal-to-shipped-improvement loop.
  Product engineering
  AI agentic engineering
  Multi-agent orchestration
/5
Maintain & Extend
Maintain & Extend → Self-operating
Turn maintenance into an agentic loop where observability surfaces regressions, agents propose fixes, and I approve — while protecting the artifacts (decision records, postmortems, specs) that are the only mitigation for the solo operator's single-point-of-failure risk, and keeping strategic and customer-facing judgment firmly human.
1. Decision records as persistent agent context
  I write a lightweight decision record for every non-obvious architectural choice — context, decision, alternatives, consequences — stored in the repo where agents read it as persistent context. It's a half-hour that saves many hours when a future agent session, a contractor, or my own context-reset self asks why. For a solo engineer who is the single point of failure for domain knowledge, externalizing decisions is the continuity mitigation; folding the summaries into CLAUDE.md means each new session inherits institutional memory without a re-brief.
  Senior SWE craft
  Harness engineering
  AI agentic engineering
2. Blameless postmortems that edit the system
  On any user-visible incident, rollback, or post-deploy defect I ask 'why did the system allow this?' not 'who erred?' Solo, the system is spec quality, CI gate coverage, context artifacts, and delegation decisions — so the output is concrete: update CLAUDE.md, add a test case, add a CI gate, write a decision record, or reclassify a non-core task as core. You can't fix people but you can fix the systems that support good choices, and each incident makes the harness measurably smarter.
  DevOps/SRE
  Harness engineering
  Senior SWE craft
3. Agentic operations pointed inward
  The same multi-agent and MCP infrastructure I use to build products, I point inward at the production system. I've run this for real: an orchestrator with sub-agents reviewing security-audit findings across AltLLM and AltClaw, converting findings into epics and tasks, fixing them incrementally with automated and manual tests, and cross-verifying on local Docker Compose stacks. Runbooks have to be checklists with copy-pasteable commands, not architecture manuals, so an agent can execute them — while novel failure modes stay human-owned.
  DevOps/SRE
  Multi-agent orchestration
  AI agentic engineering
4. Automate the operational loop, protect the customer loop
  I automate the operational loop with headless agents — scheduled jobs, scraping, triggers. Hermes Agent runs CRON jobs and job-search scraping with a Telegram entry point so I can fire workflows on the go. What I keep human is core feature development, pricing, strategic positioning, and direct customer contact. Not everything that can be automated should be; the operator who automates execution but stays close to the customer signal outperforms both the slower team and the judgment-free vibe-coder.
  Product engineering
  AI agentic engineering
  DevOps/SRE
5. Never weaken the tests; hold coverage gates
  I treat CI gates — coverage on business-critical paths, a passing integration suite, lint and type-check — as the QA team that never sleeps, and I enforce the harness rule that agents never remove or weaken tests and never try to one-shot a whole app. Playwright and Synpress carry the end-to-end and wallet-signing flows. On a maintain-and-extend arc this is the regression net that makes it safe to let an agent extend a codebase I haven't touched in weeks.
  Senior SWE craft
  Harness engineering
  DevOps/SRE

Six laws I operate by.

( Principles )

The harness is the team.
Every layer substitutes for a human role: memory for the onboarding doc, tools for DevOps integrations, permissions for the security reviewer, hooks for the CI gate, observability for the SRE. An agent is model plus harness, and reliability is determined less by the model than by the harness around it — so building well-designed systems around agents is the meta-skill, not writing clever prompts.
Spec quality is the binding constraint, not the model.
The irreplaceable solo contribution is the spec — decomposing a problem into parallelizable subtasks with unambiguous acceptance criteria. Undocumented contracts and ambiguous intent are the most frequent source of rework, because agent failures are primarily context failures. The multiplier accrues to engineers who already have the domain depth to write complete specs; it amplifies craft, it doesn't replace it.
Writes stay single-threaded; intelligence fans out.
Code-writing stays on one agent because conflicting concurrent decisions produce inconsistent implementations. Read-heavy research, exploration, and review fan out to parallel agents whose fresh context is an asset. This one law governs every phase — parallel discovery at zero, a single writer at one, fresh-context review in production.
Mechanical enforcement compounds; advice degrades.
A hook that blocks an operation is guaranteed; an instruction in a context file competes with everything else and can be overridden under pressure. So I move the rules an agent repeatedly violates into deterministic hooks and prune the rest on breakage like code. Verification gates the system runs automatically are worth more than any quantity of good intentions.
Verify cross-model, from a fresh context.
Agents on the same model share blind spots and confidently converge on the same wrong answer, so verification comes from a different model or at minimum a fresh context with no shared history. This is simultaneously my code-review function, my security-review function, and my correctness gate — and it's the structural reason a Claude-Code-generates / Codex-verifies pairing earns its keep.
Context is managed by the system, not by me.
Context rot degrades output well before the window fills — it's a production failure mode, not a model limitation. So compaction, just-in-time retrieval, subagent isolation, and file-based memory get engineered into the harness, which frees me to make product decisions instead of doing token hygiene across multi-hour, multi-week work.

Real tools, real layers.

( The stack )

Memory & context CLAUDE.md / AGENTS.md .claude/rules/ Obsidian Notion Excalidraw: The institutional-memory substitute — architecture, conventions, gotchas, and decisions codified so agent output stays coherent across context resets. This repo runs a real CLAUDE.md / AGENTS.md / .claude/rules/ setup I maintain like code; Obsidian is my second brain and daily notes, Notion holds project docs and tracking, Excalidraw is where architecture gets sketched before it gets specced.
Agentic coding & orchestration Claude Code Codex OpenCode Go Multica AI Conductor Git worktrees: Claude Code (Max plan) is my primary coding agent; Codex (Plus plan) is the cross-model verifier and second opinion; OpenCode Go handles quick lightweight tasks. Multica AI and Conductor run parallel agent workspaces so fan-out research and isolated writer sessions don't collide. The dynamic-workflow patterns I reach for live on /workflows.
Tools & permissions MCP servers Figma MCP settings.json allow/deny Role-based tool fencing: MCP tool descriptions treated as public API contracts — I've built real ones (~37 protocol MCP servers on the AltLayer platform serving ~176 MCP tools across EVM + Solana, with READ operations free and WRITE operations behind explicit user confirmation). Filesystem access scoped to the repo; orchestrators get Task not Edit, implementors get Edit not recursive Task, so runaway recursion can't start.
Hooks & verification PreToolUse / PostToolUse hooks Playwright Synpress Fresh-context reviewer subagent: Mechanical enforcement outside the context window: hooks block dangerous ops and run lint / type-check / tests; Playwright and Synpress close the loop on UI, API, and wallet-signing behavior; a fresh-context (ideally different-model) reviewer catches what the original context can't see.
Managed infrastructure Vercel GitHub Actions Docker / Kubernetes / GCP PostgreSQL / Prisma Privy / Web3Auth / CDP wallets: Each managed service deletes an engineering subdomain for a fixed subscription — a fraction of equivalent headcount, and more importantly the elimination of the second full-time job of being a sysadmin. These are platforms I've shipped on in production; I resist self-hosting until a specific measured constraint forces it.
Observability & SRE OpenTelemetry Error tracking Uptime monitoring DORA dashboard from CI/GitHub events: The entire SRE function for one person, instrumented from day one. Activation and week-one retention go in first; the same conventions point at the agent harness itself so its most failure-prone layer is observable. AI triage handles known failure modes from runbooks; I own the novel ones.
Product & growth Product analytics + feature flags Hermes Agent (CRON, scraping, Telegram triggers) Linear / issue tracking Resend: One flag system plus one analytics platform is the full growth-engineering motion solo — flags are the interface between the agentic build loop and the product experiment loop. Hermes Agent runs the scheduled operational loop (jobs, scraping, on-the-go triggers) so the execution layer runs without me babysitting it.

Where the model breaks.

( Anti-patterns )

Single point of failure — the one-person hostage situation
I am the single point of failure for the whole operation: illness, burnout, or an emergency halts everything. A product where I'm the only entity that understands the system isn't a one-person company, it's a one-person hostage situation. The mitigation is documentation-as-resilience — decision records, CLAUDE.md, runbooks — so an agent or a contractor can resume from documented state instead of my memory. Skipping these under deadline pressure is the highest-leverage debt a solo operator can take on.
Same-model echo chamber
Stacking more agents of the same model doesn't add reliability — it adds an echo chamber. They share failure modes and converge confidently on the same wrong answer with no internal friction, so 'add a reviewer agent' on the same model buys a performance of review, not review. The fix is cross-model verification or, at minimum, fresh-context isolation; outsourcing the verification model to a second vendor is a legitimate move, not a compromise.
Premature multi-agent proliferation
Splitting work into many agents too early pays a coordination tax — more prompts, traces, and approval surfaces — without proportional gains, because the bottleneck is communication, not coding ability. I start with one agent and add specialists only when they materially improve capability isolation, policy isolation, prompt clarity, or trace legibility. Most workflows that feel like they need ten agents need three-to-five with sharper contracts and a stronger synthesis prompt.
Velocity outrunning stability
Speed without controls makes delivery less reliable, not more — AI adoption without small batches, automated testing, and fast feedback is associated in the research with a measurable stability decline. Shipping agent-generated code without day-one error tracking means discovering bugs through user-reported failures instead of alerts. The error budget and mandatory coverage gates are the circuit breakers; without them the generation rate silently outpaces what the system can absorb.
Scope creep under cheap generation
Because agent-generated code is cheap to produce, the ease of dispatch biases me toward building more than the riskiest-assumption hypothesis requires — and cheap to produce is not cheap to maintain. Shipping 3x the scope 3x faster buys 3x the maintenance burden and 3x the instrumentation noise. The discipline is subtraction first: scope every dispatch to the minimum surface that tests the assumption, and roll back any output that exceeds the spec.
Where solo + AI breaks — and what I hand off
Judgment does not delegate. AI is strong on known failure modes with runbooks; humans stay essential for novel failures and architectural calls. Direct customer contact is strategically irreplaceable at zero-to-one — there's a well-documented case of a founder shutting down a technically successful AI support bot after two weeks because the feedback signal it automated away was the most valuable intelligence in the business. The model is not a substitute for craft — it amplifies craft that already exists. I automate the operational loop and keep the customer and strategic loops firmly human.

The harness is the real product

( Deep dive )

The instinct of an engineer dropped into agentic development is to optimize the prompt — to find the magic phrasing that makes the model behave. That's the wrong altitude. The reliability of an AI-augmented practice is determined less by the model and more by the harness around it: the layered structure of memory, tools, permissions, hooks, and observability that constrains what the agent knows and what it can and can't do. Prompt engineering shapes what an agent tries; context engineering shapes what it knows; harness engineering shapes what it can do — and only the last one comes with mechanical enforcement.

That distinction matters because enforcement compounds and advice degrades. A hook that blocks a destructive operation is a guarantee; an instruction in a context file saying 'never do that' competes against the whole context window and loses under pressure. So the lesson generalizes across the arc: a Stop hook running the test suite is a mandatory CI gate, an error budget is a release-velocity circuit breaker, a coverage gate that never weakens is a QA team. The rules I find an agent repeatedly bending become deterministic hooks; the rest I prune on breakage like code.

For a solo operator this reframes the entire job. Each harness layer substitutes for a human role — memory for the onboarding doc, tools for the DevOps team's integrations, permissions for the security reviewer, hooks for the CI gate, observability for the SRE. The harness is the org chart of the one-person company, and I maintain it with the discipline of application code: checked into git, pruned on breakage, evolved as the product changes. On this very portfolio repo that's a concrete CLAUDE.md, AGENTS.md, and .claude/rules/ setup. The meta-skill of senior craft, in this era, is building reliable systems around agents — not writing code, and certainly not writing clever prompts.

AI agents as my team

( Deep dive )

A one-person army deploys a parallel workforce without paying salaries — but only if it respects the laws that govern multi-agent systems. The foundational one: writes stay single-threaded, intelligence fans out. Concurrent agents writing code produce conflicting implicit decisions; the canonical failure is a subagent without full context building the wrong thing into the wrong place. Read-heavy work is the opposite — parallel agents exploring a problem space or reviewing a diff from fresh context deliver quality a single agent can't, because the fresh context is an asset, not a liability.

That resolves into a per-phase rhythm. At zero I fan out discovery agents to scope the problem and synthesize the best elements into the spec — read-only work that breaks no write law. At one, a lead decomposes into worker contracts, each with an explicit objective, output format, tools, and exclusions, while a single implementing agent owns all the writes. In production a fresh-context reviewer — ideally a different model, to dodge correlated blind spots — grades the work. At scale, model-tier routing and a plan/execute split keep the token budget governable. I run these as parallel workspaces through Multica AI and Conductor; the specific dynamic-workflow patterns I reach for are catalogued on my /workflows page.

The honest caveat that keeps this from becoming cargo-cult orchestration: collaboration has a tax. Interagent misalignment is a leading source of multi-agent failure, because the bottleneck is communication, not coding ability. So I earn parallelism only with genuine task independence and well-specified contracts — start with one agent, add specialists only when they materially improve capability isolation or trace legibility. Most workflows that feel like they need ten agents need three-to-five with sharper contracts and a stronger synthesis prompt. The orchestration skill is engineering-manager work, and the human's contribution shifts decisively from implementation to specification.

Where solo + AI breaks — and what I hand off

( Deep dive )

The model has hard boundaries, and pretending otherwise is how a one-person operation becomes a one-person liability. The boundaries fall into three buckets: what cannot be delegated to agents, what must be delegated to managed services, and what compounds into silent failure.

First, judgment does not delegate. AI excels at known failure modes with runbooks; humans stay essential for novel failures and architectural decisions. Direct customer contact is strategically irreplaceable — there's a well-documented case of a solo founder deliberately shutting down a technically successful AI support bot after two weeks, not because the technology failed, but because the feedback signal it was automating away was the most valuable intelligence in the zero-to-one arc. So I automate the operational loop and protect the customer and strategic loops. The model amplifies craft that already exists; it does not manufacture it.

Second, infrastructure gets outsourced, not built. Every novel framework or self-hosted service is a future on-call incident with no on-call team. The right move at zero-to-one is to over-index on managed platforms — hosting, database and auth, payments, compute — each deleting a subdomain for a fixed subscription that's a fraction of equivalent headcount, and to bring infrastructure in-house only when a concrete measured constraint forces it. The hidden cost of self-hosting is never the bill; it's the cognitive load of a second full-time job as systems administrator.

Third, the failure modes compound silently. Same-model teams echo-validate errors, which is why verification comes from a different model or a fresh context. Premature multi-agent proliferation pays a coordination tax without the parallelism benefit. AI adoption without controls is associated with a measurable stability decline, discoverable only through observability you build before shipping, not after the first incident. And the single-point-of-failure risk is mitigated only by relentless documentation-as-resilience — decision records, runbooks, and CLAUDE.md that let an agent or a contractor resume from documented state. The solo operators who survive treat these boundaries as load-bearing parts of the architecture, not as concessions.

How this shows up in my work.

( Grounding )

MCP server platform (AltLayer)
Tool/ACI design as a first-class surface, made real: I built ~37 protocol MCP servers (~176 MCP tools across EVM + Solana) integrated into AltLLM's multi-model LLM platform, architected with guided tool-calling via schema-level examples and workflow hints so agents could select tools reliably. WRITE operations ran behind explicit user confirmation (HITL), CDP wallet flows were integrated and later moved to server wallets, and the system was migrated to a stateless Hono API. A ManusAI-style chat UI kept every tool call visible to the user.
AltLLM & AltClaw security-audit orchestration (AltLayer)
Agentic operations pointed inward: an orchestrator with sub-agents reviewing security-audit findings, converting them into epics and tasks, and fixing them incrementally with automated and manual tests — cross-verified on local Docker Compose stacks across worktree branches. This is the maintain-and-extend loop running for real.
Limitless Exchange (Limitless Labs)
The core/non-core boundary held under pressure: market-creation flows touched smart contracts and Gnosis Safe multisig, so they got human review on every step — the human-in-the-loop gate wherever money or onchain state is on the line. Four codebases (API, contracts, indexer, frontend) owned end-to-end as the founding fullstack engineer.
Rumour (AltLayer)
Frontend lead from early iterations to production launch with strict pixel/Minecraft design, PWA, wallet-signing flows, Hyperliquid perps, Tencent Chat, and OKX/LiFi bridging — plus internal admin tooling where automated bots scanned X/Twitter trends to create rumours. Playwright + Synpress carried the runnable E2E and wallet-signing checks.
Atlantis World — metaverse + ~$1M ETH NFT sale
End-to-end ownership before it had a name: 13 of 18 DeFi/Web3 protocol integrations shipped as in-game NPCs, backend sale automation with merkle-tree signature verification behind the community raise, IPFS asset pipeline, and CI/CD — taking ambiguous requirements from concept to ship, frontend to smart contract to operational glue.

Sources.

( References )

Effective context engineering for AI agents ↗
Anthropic Engineering
System-prompt optimization, just-in-time retrieval, compaction, structured note-taking, and sub-agent architectures as finite-resource management.
How we built our multi-agent research system ↗
Anthropic Engineering
Orchestrator-worker with a lead plus parallel subagents; synchronous wave execution; tool-description quality as a critical failure point.
Don't Build Multi-Agents ↗
Cognition
Multi-agent systems are fragile under shared context and conflicting decisions; the Flappy Bird subtask-misinterpretation example; single-threaded linear agents.
Multi-Agents: What's Actually Working ↗
Cognition
Fresh-context code review catches an average of ~2 bugs per PR (58% severe); writes must stay single-threaded; additional agents contribute intelligence, not actions.
Deterministic AI Orchestration: A Platform Architecture for Autonomous Development ↗
Praetorian
Hook-based enforcement (PreToolUse, PostToolUse, SubagentStop) outside the context window; role-based tool fencing; thin-agent / fat-platform principle.
Best Practices for Claude Code ↗
Anthropic / Claude Code Docs
Four-phase Explore-Plan-Implement-Commit workflow; CLAUDE.md design guidelines; skills/hooks/subagents harness infrastructure; context lifecycle management.
DORA 2024 Accelerate State of DevOps Report ↗
DORA / Google
AI adoption without robust controls is associated with decreases in throughput (−1.5%) and stability (−7.2%); small batches, automated testing, and fast feedback as countermeasures.
Postmortem Culture: Learning from Failure ↗
Google SRE
Blameless postmortem methodology — you can't fix people but you can fix systems; systemic vs. individual causation.
Embracing Risk ↗
Google SRE
100% reliability is worse than ~99.9%; the error-budget formula; the cost of reliability is non-linear.
What is Product for Engineers? ↗
PostHog
The product engineer as a mindset: talks directly to users, works autonomously, owns the outcome rather than the ticket.
One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise ↗
arXiv
Core/non-core partition heuristic and specification-driven development; reports 5.4x throughput, 90% code-acceptance, and mandatory 92.8%+ coverage in a different team's brownfield study.
Solo founders are using AI to do the work of entire teams — but going it alone has limits ↗
Fortune
Maor Shlomo / Base44; AI agent workflows for feedback triage, UX crawling, and QA — and the deliberate shutdown of a technically successful customer-support bot.

← Back to home

The One-Person Army Field Manual

Zero to self-operating.

Harness memory before features

Fan-out discovery before feature code exists

Interview-to-spec to surface the unknowns

Scope the MVP to the single riskiest assumption

Choose managed services to delete whole domains

Explore → Plan → Implement → Commit

Core / non-core partition for delegation

Single-threaded writes, fan-out intelligence

Tool / ACI design as a first-class surface

Flag-first feature lifecycle

Give the agent a check it can run

Hook-based mechanical enforcement

Cross-model adversarial verification

SLOs and error budgets as the velocity circuit breaker

Instrument activation and retention from day zero

Model-tier routing / plan-execute split

DORA four-keys as a personal dashboard

Externalized memory for long-horizon continuity

Hypothesis-driven experiment loop

Decision records as persistent agent context

Blameless postmortems that edit the system

Agentic operations pointed inward

Automate the operational loop, protect the customer loop

Never weaken the tests; hold coverage gates

Six laws I operate by.

The harness is the team.

Spec quality is the binding constraint, not the model.

Writes stay single-threaded; intelligence fans out.

Mechanical enforcement compounds; advice degrades.

Verify cross-model, from a fresh context.

Context is managed by the system, not by me.

Real tools, real layers.

Where the model breaks.

Single point of failure — the one-person hostage situation

Same-model echo chamber

Premature multi-agent proliferation

Velocity outrunning stability

Scope creep under cheap generation

Where solo + AI breaks — and what I hand off

The harness is the real product

AI agents as my team

Where solo + AI breaks — and what I hand off

How this shows up in my work.

MCP server platform (AltLayer)

AltLLM & AltClaw security-audit orchestration (AltLayer)

Limitless Exchange (Limitless Labs)

Rumour (AltLayer)

Atlantis World — metaverse + ~$1M ETH NFT sale

Sources.