Back to Blog
·7 min read
Edit

Harness Engineering

AI Engineering

Most teams using AI for code generation do the same thing. They write a long instruction, requirements, an example, maybe some constraints, and fire it off. Sometimes the output is good. More often it's 80% right but wrong in ways that are hard to spot and slow to fix. The team spends two days cleaning up what was supposed to save them three.

The issue isn't capability. It's that a prompt asks the AI to understand the problem, plan the approach, write the code, and verify the result all at once. You'd never hand a junior developer a requirements doc and say "come back when it's done." You'd talk through the problem first, sketch an approach, review it together, then let them build.

I've been building what I call skills, small, repeatable workflows that each handle one stage of development. Less like prompts, more like job descriptions. Each one defines the role, the inputs, what good output looks like, and where to hand off. Individually they're nothing special. Chained together, they change how the work gets done.

Seven stages of building a feature

Let me make this concrete. Say a product manager walks over and asks for a notification system. Users should get an email when someone comments on their post. Sounds simple. It never is.

Here's how the workflow runs when you break it into stages.

Stage 1: Research — read before you write

Before anything else, the AI reads. It explores the existing codebase — how does the app currently send emails? Is there already a notification model? What does the comment creation flow look like? Are there existing patterns for background jobs?

What comes out is a short findings document: "The app uses SendGrid for transactional emails via a shared EmailService class. Comments are created through a CommentController that fires a CommentCreated event. There's an existing notifications table but it's only used for in-app alerts, not email. Background jobs run through Sidekiq."

This takes two minutes. It replaces the half-day a developer spends grepping through the codebase, reading old PRs, and piecing together how things connect.

Stage 2: Brainstorm — ask before you assume

"Send an email when someone comments" sounds straightforward until you start asking questions. What if someone gets 30 comments in ten minutes, do they get 30 emails? What about replies to their own post, should they get notified of their own comment? What if the commenter deletes the comment before the email goes out? What if the user has turned off email notifications?

Rather than letting the AI guess at these decisions, a brainstorming step walks through them one at a time. Each answer gets captured in a structured decision document: batch notifications into a 5-minute digest, exclude self-comments, check comment still exists before sending, respect the existing email_preferences column.

The output isn't code. It's clarity about what the code needs to do. A developer could read this document and build the feature without any AI involvement, that's how you know the thinking is solid.

Stage 3: Plan — define the work before doing it

The brainstorm decisions feed into a planning step that produces concrete implementation. Not vague intentions, specific files to create, which existing classes to modify, what the database migration looks like, what tests to write and what they should assert.

The plan might specify: add a NotificationDigestJob that runs every 5 minutes via Sidekiq, query undelivered comment notifications grouped by recipient, render using the existing EmailService with a newn comment_digest template, add a check against email_preferences before sending. Four test cases: single comment triggers email, multiple comments get batched, self-comments are excluded, deleted comments are skipped.

It also calls out what's explicitly not in scope, no push notifications, no notification preferences UI, no real-time WebSocket alerts. These might come later. They're not this ticket.

Stage 4: Review — challenge the plan before building from it

A review step reads the plan with sceptical eyes. Are there unstated assumptions? Missing edge cases? Has scope quietly expanded?

Say the review catches that the plan assumes every user has an email address on file, but the app supports OAuth sign-up where email is optional. A thirty-second catch that would have been a production bug report two weeks after launch: "Users who signed up with GitHub aren't getting notifications."

The plan gets updated to handle the missing-email case. Only then do we build.

Stage 5: Build - generate code from a clear brief

Now the AI writes code. By this point it has the codebase patterns from the research step, explicit decisions about every edge case, a reviewed implementation plan, and clear scope boundaries.

The difference is obvious. Instead of generating a sprawling notification system and hoping, it's executing a specific brief. The digest job batches exactly as decided. Self-comments are filtered using the logic agreed in the brainstorm. The email_preferences check uses the existing column the research step found. The test cases assert the four scenarios identified in the plan. Nothing is invented. Nothing is assumed.

Stage 6: Code review — automated quality gate

A code review step runs automatically on the generated output. Convention violations. Missing error handling. A Sidekiq job that doesn't handle the case where the user gets deleted between queueing and execution. The kind of issues humans skim past at 4pm on a Friday.

This isn't a replacement for human review. It's a safety net that catches the mechanical stuff so the human reviewer can focus on whether the batching logic actually makes sense in practice.

Stage 7: Capture — remember what you learned

After the feature ships, a capture step records what went well and what tripped the team up. The missing-email edge case from OAuth sign-ups becomes a documented pattern — "always check whether the field exists, not just whether the model exists." The batching approach becomes a reference for the next time a feature needs to aggregate events before acting.

The next feature benefits from everything learned building this one. The workflow improves because the team is paying attention, and the AI holds onto what would otherwise get forgotten between sprints.

The sequence matters

Here's the full chain:

Research → Brainstorm → Plan → Review → Build → Code Review → Capture ↓ ↓ ↓ ↓ ↓ ↓ ↓ Findings Decisions Impl. Checked Code Verified Lessons doc doc plan plan code logged

Each stage produces a visible artifact that the next stage consumes. Research feeds brainstorming. Brainstorming feeds the plan. The review improves the plan. The plan guides the build. Code review validates the build. Capture feeds the next research step. Nothing gets thrown over a wall. Every handoff is explicit.

When the one-big-prompt approach produces bad code, you're staring at the output trying to figure out whether the problem is the requirements, the approach, or the implementation. Usually all three. You tweak the prompt, regenerate, get a different flavour of wrong.

With stages, you trace back. Bad code? Check the plan. Bad plan? Check the brainstorm decisions. Bad decisions? Check whether the research surfaced the right context. You debug the process, not the AI.

This structure also scales to multi-agent setups — one instance handles research through review, another handles build through code review. The plan document becomes the handoff contract between them. The builder can't cut corners because it didn't write the plan.

Where to start

Pick your most repetitive development task. The one where you think "we've done this before." Write down the steps a senior developer follows when they do it well. Not the code, the thinking. What do they read first? What questions do they ask? What do they check before writing anything?

Those steps are your skills. Each one is a markdown file that describes the role, the inputs, what good output looks like, and where to hand off next. You don't need a framework, a clear description and some discipline gets you most of the way there. The skills get sharper with every build.

Erik Cavan

Erik Cavan

Applied AI

Share: