cjjutba
(Blog)

From idea to shipped: my AI-native engineering workflow

Typing was never the bottleneck. I ship fast by compressing the loop between an idea and verified, reviewed code — with spec-driven planning, parallel agents, and verification gates — not by generating more code faster.

12 min readUpdated May 28, 2026

When people hear "AI-native developer," they picture an autocomplete on steroids — code appearing faster than you can read it. That misses the point entirely. The speed I get from working with Claude Code has almost nothing to do with how fast characters land in a file. Typing was never the bottleneck. The bottleneck is the loop: idea → design → implementation → the moment you actually *know* it works. Generating more code faster, with no change to that loop, just gives you more unverified code to debug later.

So the thing I optimize is the loop, not the typing. Concretely: I run a brainstorm → spec → plan → execute → verify cycle on every non-trivial piece of work. AI compresses each stage — it drafts the spec, breaks the plan into tasks, writes tests and implementation, runs the checks — but I stay the decision-maker at every gate. I review, I reject, I redirect. There is no autonomy here that I haven't signed off on; the AI doesn't ship anything I haven't verified. This post is the honest version of how that works, with the receipts from this very site as evidence.

1. The loop I actually run

The whole workflow is one cycle, run deliberately, at a granularity that matches the work. Small fix: I might run it in my head in thirty seconds. A new feature or a whole site: it becomes durable, written-down artifacts I can hand to an agent and audit afterwards. The five stages are always the same.

The brainstorm → spec → plan → execute → verify loop
Each arrow is a review gate. AI compresses the work inside each stage; I decide whether it advances to the next.
  1. Brainstorm — interrogate the idea before any code exists. What are we actually building, for whom, and what's explicitly out of scope?
  2. Spec — write down the design: goals, success criteria, non-goals, the shape of the solution. A durable document, not a chat message.
  3. Plan — turn the spec into a numbered, task-by-task implementation plan with a testing strategy and conventions baked in.
  4. Execute — work the plan one task at a time: test first where it earns its keep, implement, commit. Small, reviewable steps.
  5. Verify — run the tests, the build, the linter, the app; review the diff. No claim of "done" without evidence.

1.1 Brainstorm first: design before code

The most expensive bug is the feature you built correctly but shouldn't have built at all. So the loop starts before any file is touched, with a brainstorm that exists to surface the real intent and kill ambiguity. What problem are we solving? Who is it for? What does "good" look like, measurably? And — the question that saves the most time — what are we *not* doing?

This is where AI is genuinely useful in a way that has nothing to do with code generation. An agent that pushes back, asks the questions I'd skip, and forces me to name non-goals out loud is worth more at this stage than one that writes a thousand lines. A spec exists because design decided in conversation evaporates; design written down can be reviewed, disagreed with, and held to. The portfolio you're reading started exactly here — as a design document, not a folder of files.

Non-goals (v1): blog/CMS, multi-language, analytics dashboards, e-commerce, auth.

docs/superpowers/specs/2026-05-29-cjjutba-portfolio-design.md

That line is from the real spec for this site. Naming those non-goals up front is what kept v1 from sprawling — and ironically, the blog that line excludes shipped later, deliberately, as its own scoped piece of work with its own spec. Scope is a decision you make on purpose, not a thing that happens to you.

1.2 Spec and plan: durable artifacts, not chat history

Once the design is settled, it becomes two committed files: a spec (what and why) and a plan (how, step by step). These live in the repo, in version control, alongside the code they produced. They are not throwaway prompts — they're the source of truth an agent executes against and that I audit against afterwards. This site's docs/superpowers/ directory holds the real ones:

  • docs/superpowers/specs/2026-05-29-cjjutba-portfolio-design.md — the design spec: goals, success criteria, information architecture, positioning, and the honesty constraints (no fabricated metrics, no fake testimonials).
  • docs/superpowers/plans/2026-05-29-cjjutba-portfolio.md — the implementation plan: phased, task-by-task, with a testing strategy and a conventions section.
  • docs/superpowers/specs/2026-05-29-blog-posts-toc-island-design.md and its matching plan — the blog and floating table-of-contents you're using right now, scoped and planned separately from the main site.

Writing the plan is itself a forcing function. If a task can't be stated crisply with a clear way to verify it, the design isn't done yet — and that's far cheaper to discover in a markdown file than three hundred lines into the implementation. The plan also carries the boring-but-load-bearing conventions up front: package manager, commit style, import aliases, and the rule that the build and linter must pass after every phase. Decisions made once, in writing, instead of re-litigated per file.

Why write it down at all?

A spec and plan in the repo give an agent a stable target across a long session — context that survives compaction and restarts.

They give me an audit trail: I can diff what was built against what was agreed, and catch silent scope drift.

And they're honest documentation for whoever owns the code next. The artifacts ship with the product.

1.3 Execute task by task: TDD and small commits

Execution is deliberately unglamorous: work the plan one task at a time. Where logic has real behavior worth pinning down — a validation schema, a slug builder, a reading-time estimate — I write the test first, watch it fail, then implement until it's green. Where the work is purely visual, I don't write brittle snapshot tests; I run the app and look at it. The discipline is matching the verification to the work, not testing for the sake of a coverage number.

A plan task is small and self-contained, with its own verification step. Here's an example of the shape and granularity a task takes — note that the verification command is part of the task, not an afterthought:

Example: one task from a plan (condensed)
### Task 3: Reading-time estimate - [ ] Step 1: Write the failing test- [ ] Step 2: Run test to verify it fails      Run: pnpm test src/lib/__tests__/reading-time.test.ts- [ ] Step 3: Implement until green      Run: pnpm test src/lib/__tests__/reading-time.test.ts      Expected: PASS.- [ ] Step 4: Commit

Each completed task becomes one commit, in Conventional-Commits form, scoped to a single concern. That's not aesthetic — it's what makes the history reviewable and reversible. This site was built in over 60 small commits at the time of writing, overwhelmingly feat: and fix:, with docs:, test:, style:, and copy: for the rest. The cadence looks like this:

git log --oneline (excerpt)
feat(blog): [slug] route + per-post OG image, index cards, sitemapfix(toc): focus sheet on open, hide island behind modal, ARIA cleanupfeat(toc): floating island + bottom-slide modal + scroll-spyfeat(blog): structured BlogPost content model + index helperstest: about + dark-mode popover/modal screenshots, about smoke + axefix(contact): check Resend send error (no more silent success) + log failures

A small commit is a small blast radius. If a change is wrong, I revert one thing, not a day's work. If a reviewer (me, or a client's engineer later) wants to understand a decision, the commit message and its diff tell the story. This is the part of "moving fast" that people underweight: speed compounds when mistakes stay cheap.

2. Parallelism without chaos

Once the loop is tight, the obvious next lever is doing more than one thing at once. AI makes that tempting — you can spin up several agents and watch them all work. But naive parallelism is how you get merge hell, half-finished features stepping on each other, and a verification step that can't tell you what actually changed. The trick is isolation and independence: parallel work only pays off when the streams genuinely don't touch.

2.1 Worktrees: isolation by default

When two pieces of work could collide, I give each its own git worktree — a separate working directory backed by the same repository, on its own branch. No stashing, no "wait, which branch am I on," no half-applied changes bleeding across features. Each worktree is a clean room: its own files, its own running dev server if needed, its own test runs.

isolating a feature in its own worktree
# spin up an isolated workspace for a feature branchgit worktree add ../cjjutba-blog -b feat/blog # work, test, and commit there without touching the main checkout# when it's merged and done:git worktree remove ../cjjutba-blog

The payoff is psychological as much as technical: because each stream is physically isolated, I can let an agent run inside one worktree without worrying it'll corrupt work in progress somewhere else. Isolation is what makes parallelism *safe* rather than just *fast*.

2.2 Subagents: fan-out for independent tasks

Inside a single piece of work, the same idea applies at the task level. When a plan has several tasks with no shared state and no ordering between them, I fan them out to subagents — one agent per task, each with a focused brief — instead of grinding through them serially. This blog itself was built that way: the four posts are independent data files, so they were authored in parallel, each agent grounding its own post against its own source repo, then handed back for review and a single registration step.

The constraint that makes this work is the same one from worktrees: independence. Subagent fan-out shines for genuinely parallel work — separate files, separate concerns, no task waiting on another's output. The moment tasks share state or have to happen in order, fan-out stops helping and starts manufacturing conflicts. Then I run them sequentially, in the order the plan specifies, on purpose.

When parallelism helps vs. hurts

Helps: independent tasks with no shared files or state — e.g. four separate content files, or a feature and an unrelated bug fix in different directories. Isolate each in a worktree or a subagent and merge when each is independently green.

Hurts: tasks that touch the same module, depend on each other's output, or need a shared decision made first. Parallelizing those just moves the work into a painful merge and muddies what the verification step is actually telling you.

Rule of thumb: if you can't describe two tasks without the word "after," they're not parallel. Sequence them.

3. Verification is the actual job

Here's the part that separates an AI-native workflow that ships reliable software from one that ships fast-looking liabilities. Generating code is the easy 80%. The job — the thing a client is actually paying for — is *knowing* the code does what it's supposed to do. Verification isn't a phase at the end; it's the point of the whole exercise. Everything before it is in service of being able to make a claim you can back up.

3.1 Evidence before claims

My standing rule, applied to myself and to every agent I run, is evidence before assertions. Nothing is "done," "fixed," or "passing" until I've run the command and read the output. Not "this should work" — the green test, the clean build, the rendered page. On this site that means a concrete, layered set of gates:

GateWhat it provesHow it runs here
Unit testsLogic behaves as specifiedVitest — schema validation, slug/TOC builders, reading-time, content invariants
BuildIt actually compiles and type-checkspnpm build must pass after every phase
LintNo obvious correctness or style regressionspnpm lint as a phase gate
E2E + a11yReal pages render and are accessiblePlaywright smoke tests + an automated axe accessibility pass
Diff reviewThe change matches the agreed planI read every diff before it's committed

Tests catch the regressions; review catches the things tests can't. An agent can write a passing test for the wrong behavior, or solve the stated problem while quietly breaking an unstated assumption. So the human review of the diff against the spec is not optional — it's where I confirm the code does the *intended* thing, not merely *a* thing. That's the gate AI does not get to skip.

Where AI coding fails silently

It declares victory without running anything — "this should fix it" is not evidence, and an unrun test is just a hopeful comment.

It writes a test that asserts the buggy behavior, so the suite is green and wrong at the same time.

It silences errors to make output look clean — swallowing a failure path so a call *appears* to succeed. (This exact bug shipped to this repo once and was caught in review: the contact form reported success even when the email send errored. The fix — fix(contact): check Resend send error (no more silent success) + log failures — is in the history.)

It confidently invents APIs, config keys, or version behavior that don't exist in the actual dependency. Grounding every claim against the real repo and the real docs is the only defense.

Every one of those failure modes is invisible if you trust the narration instead of the output. The whole discipline reduces to a single habit: don't believe it works because something said it works — believe it because you watched it work.

4. What this means if you're hiring me

The reason any of this matters to you, if you're considering working with me, is that it resolves a tension you've probably been told is unresolvable: speed and rigor are not a trade-off here. The usual choice is a fast contractor who leaves you a fragile mess, or a careful shop that takes six months and a five-figure invoice to ship a v1. The loop above is how I deliver agency-quality output on a solo timeline — fast *because* of the rigor, not despite it.

Where the speed actually comes from is specific, not magic: AI compresses the time inside each stage of the loop — drafting the spec, breaking down the plan, writing tests and implementation, running the checks. I stay the bottleneck on judgment, which is the part you want a human owning anyway. The net effect is a cycle that turns in weeks, not months — you see working, verified software early and often, and iterate against real feedback instead of waiting for a big-bang reveal.

What you actually get

A scoped spec and plan up front, so we agree on what "done" means before I build it.

Working software early and often, in small reviewable increments — not a six-month black box.

Tested, linted, accessible code that you own outright: no lock-in, no untanglable mess handed off.

Honest verification — every claim of "done" backed by a result you can see, not a status I narrated.

I'd rather show you the number than assert it, so here's the honest version of the headline metric, pending a real figure from a recent build:

{{TODO: real weeks-to-ship number from a recent build}}

to be replaced with a concrete, real timeline from a recent project

Until that placeholder is filled with a real number, treat "weeks, not months" as the qualitative claim it is — and judge it the way I'd want you to judge any engineer's claims: by the evidence. This site is some of it. If that's the way you want your software built, let's talk.

Open to work

From concept to creation let's make it happen.

I'm available for full-time roles & freelance projects.

I thrive on crafting dynamic web applications, and delivering seamless user experiences.