flowchart LR
A["🔤 Autocomplete\nToken prediction\nin the editor"] --> B["📝 Inline Suggestions\nComplete entire\nfunctions"] --> C["💬 Chat Interfaces\nDescribe features\nin natural language"] --> D["🤖 Autonomous Agents\nClone repos, plan, execute,\ntest, submit PRs"]
style A fill:#1a1a2e,stroke:#4a4a8a,color:#fff
style B fill:#16213e,stroke:#4a4a8a,color:#fff
style C fill:#0f3460,stroke:#4a4a8a,color:#fff
style D fill:#533483,stroke:#7a5ab5,color:#fff
The New SDLC With Vibe Coding
From ad-hoc prompting to Agentic Engineering. This paper traces the spectrum from casual vibe coding to disciplined agentic engineering, examines how the developer’s role shifts from writing code to exercising judgment, and lays out what it takes to adopt AI tools in ways that produce software you can depend on.
Authors: Addy Osmani, Shubham Saboo, and Sokratis Kartakis — May 2026
The most profound shift in software engineering isn’t a new language, framework, or cloud service. It’s the transition from writing code to expressing intent, and trusting intelligent systems to translate that intent into working software.
Introduction
For most of computing history, programming has been an act of translation: understand the problem in human terms, design a solution in abstract terms, then render it in syntax a machine can execute. Each step introduces friction. That friction is now collapsing. Software engineering is undergoing its most significant transformation since the introduction of high-level programming languages. For decades, the developer’s primary interface with the machine has been syntax: curly braces, semicolons, type annotations, and the precise grammar of programming languages. That era is ending.
A new paradigm has arrived in which developers express what they want to build rather than how to build it. The machine handles implementation. The human provides intent, architecture, and judgment. As of early 2026, 85% of professional developers regularly use AI Coding Agents, 51% use them daily, and an estimated 41% of all new code is AI-generated.
This shift began with autocomplete — simple token prediction in the editor. Then came inline code suggestions that could complete entire functions. Next, chat-based interfaces allowed developers to describe features in natural language and receive working implementations. Now, fully autonomous agents can clone repositories, plan multi-file changes, execute them in sandboxed environments, run tests, and submit pull requests — all without a human typing a single line of code.
The spectrum ranges from casual vibe coding, where a developer prompts an AI and accepts whatever comes back, to disciplined agentic engineering, where AI acts as a powerful implementation engine within carefully designed systems of constraints, tests, and feedback loops, with humans retaining oversight over architecture, correctness, and quality.
Telling a CTO that your team is vibe coding their payment processing system will — and should — raise alarm bells. Telling that same CTO that your team practices agentic engineering, with AI handling implementation under human-designed constraints while test coverage ensures correctness, is a fundamentally different conversation.
Why this paper, why now
New tools, capabilities, and paradigms emerge weekly. Engineering teams need a framework for making sense of this landscape — not a snapshot that will be outdated in months, but a set of principles and mental models that will remain useful as the specific tools evolve.
Who this paper is for
This paper is for software engineers, engineering managers, architects, and technical leaders who want to understand how AI is reshaping the SDLC and adopt these new capabilities without sacrificing the discipline that production software demands.
The Shift from Syntax to Intent
AI Agents: A Quick Refresher
An AI agent is a software system that perceives a goal, plans steps to reach it, takes actions through tools, observes the results, and iterates until the goal is met or it hits a stopping condition. Where a chatbot produces a response and waits for the next prompt, an agent runs its own loop. You give it a goal at the top, then it decides what to do next at each step.
flowchart TD
G["🎯 Goal"] --> P["👁️ Perceive\nRead current context"]
P --> PL["🧠 Plan\nDecide next step"]
PL --> A["⚡ Act\nTool call / message"]
A --> O["🔍 Observe\nCapture results"]
O --> D{Goal\nmet?}
D -- No --> PL
D -- Yes --> R["✅ Result"]
style G fill:#1a1a2e,stroke:#4a4a8a,color:#fff
style P fill:#16213e,stroke:#4a4a8a,color:#fff
style PL fill:#0f3460,stroke:#4a4a8a,color:#fff
style A fill:#533483,stroke:#7a5ab5,color:#fff
style O fill:#0f3460,stroke:#4a4a8a,color:#fff
style D fill:#1a1a2e,stroke:#e94560,color:#fff
style R fill:#1a472a,stroke:#2d6a4f,color:#fff
Every agent is built from five parts:
- The model — the reasoning engine. It reads the current context, decides what should happen next, and produces the next thought, tool call, or message.
- Tools — connect the model to the world: APIs, code execution, databases, and other agents it can delegate to.
- Memory — the state. Allows the agent to recall past interactions, retrieve project-specific rules, and retain context across sessions.
- Orchestration — the code that runs the loop. Assembles context for each model call, dispatches tool calls, captures results, and decides whether to continue.
- Deployment — what turns the prototype into a service: hosting, identity, observability, and production infrastructure.
What is Vibe Coding?
In February 2025, Andrej Karpathy described an approach where you “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” In this mode, a developer describes what they want in natural language, accepts the AI’s output, and when something breaks, copies the error message back into the prompt and asks the AI to fix it.
The term went viral because it captured something real: many developers were already working this way but hadn’t had language for it. By early 2026, Karpathy himself acknowledged that the original framing was too narrow, introducing the term “agentic engineering” to describe the more disciplined end of the spectrum.
The Spectrum: Vibe Coding to Agentic Engineering
Rather than treating vibe coding and agentic engineering as a binary, it is more useful to think of them as endpoints on a spectrum. The key differentiator is not whether you use AI — it’s how much structure, verification, and human judgment surrounds the AI’s output.
flowchart LR
subgraph VC["🌊 Vibe Coding"]
V1["Casual NL prompts"]
V2["'Does it seem to work?'"]
V3["Copy-paste errors to AI"]
V4["Minimal code understanding"]
end
subgraph SAC["🔧 Structured AI-Assisted Coding"]
S1["Detailed prompts with examples"]
S2["Manual testing & spot-checking"]
S3["Developer diagnoses root cause"]
S4["Selective review of critical paths"]
end
subgraph AE["⚙️ Agentic Engineering"]
A1["Formal specs, architecture docs"]
A2["Automated test suites, CI/CD gates"]
A3["Agents self-diagnose within bounds"]
A4["Comprehensive architecture review"]
end
VC --> SAC --> AE
style VC fill:#3d1a1a,stroke:#8a4a4a,color:#fff
style SAC fill:#1a2a3d,stroke:#4a6a8a,color:#fff
style AE fill:#1a3d1a,stroke:#4a8a4a,color:#fff
| Dimension | Vibe Coding | Structured AI-Assisted | Agentic Engineering |
|---|---|---|---|
| Intent spec | Casual NL prompts | Detailed prompts with examples | Formal specs, architecture docs, memory files |
| Verification | “Does it seem to work?” | Manual testing, spot-checking | Automated test suites, CI/CD gates, LM judges |
| Codebase understanding | Minimal; may not read generated code | Selective review of critical paths | Comprehensive review of architecture |
| Error handling | Copy-paste error messages to AI | Developer diagnoses root cause | Agents self-diagnose within defined bounds |
| Appropriate scope | Prototypes, scripts, hackathons | Features in established codebases | Production systems, team-scale development |
| Risk profile | High; acceptable for disposable code | Moderate; human judgment at key checkpoints | Low; systematic verification at every stage |
The right position on this spectrum depends on the stakes. A weekend prototype can be pure vibe coding. A production API handling financial transactions demands agentic engineering. Most real work falls somewhere in between, and the skill is knowing where to draw the line for each task.
The single biggest differentiator between the two ends is how outputs get verified. In vibe coding, verification is optional. In agentic engineering, two mechanisms work together:
- Tests verify the deterministic parts: a function given this input produces that output.
- Evals verify the non-deterministic parts: did the agent take the right trajectory, choose the right tools, and produce a response that meets the quality bar.
Without both, the practice is always vibe coding, regardless of how sophisticated the prompts are.
Context Engineering: The Real Skill
As the field has matured, a key insight has emerged: the quality of AI-generated code depends less on the cleverness of your prompts and more on the quality of the context provided. This realization has given rise to context engineering — the practice of providing AI agents with rich, structured information about your codebase, architecture, conventions, and intent.
Developers must consider six primary types of context:
- Instructions — the agent’s core role, goals, and operational boundaries.
- Knowledge — retrieved documents, architectural diagrams, and domain-specific data.
- Memory — short-term session logs (what just happened) and long-term persistent state (what the project is).
- Examples — few-shot behavioral demonstrations and codebase reference patterns.
- Tools — precise definitions of the APIs, scripts, and external services the agent can invoke.
- Guardrails — hard constraints, formatting rules, and safety validations.
Static vs. Dynamic Context
flowchart TB
subgraph Static["📌 Static Context (Always Loaded)"]
direction LR
SI["System instructions"]
RF["Rule files\n(AGENTS.md, CLAUDE.md)"]
GM["Global memory"]
PD["Persona definitions"]
end
subgraph Dynamic["⚡ Dynamic Context (On Demand)"]
direction LR
SK["Skill instructions\n(task matching)"]
TR["Tool results\n(execution)"]
RAG["RAG documents\n(fetched)"]
WH["Windowed session\nhistory"]
end
Agent["🤖 Agent"] --> Static
Agent --> Dynamic
style Static fill:#1a2a3d,stroke:#4a6a8a,color:#fff
style Dynamic fill:#1a3d1a,stroke:#4a8a4a,color:#fff
style Agent fill:#533483,stroke:#7a5ab5,color:#fff
- Static context is always loaded: system instructions, rule files, global memory, persona definitions. It defines who the agent is and how it behaves. It is expensive because every token is present in every interaction.
- Dynamic context is loaded on demand: skill instructions triggered by task matching, tool results, documents fetched from RAG pipelines, windowed session history. It is efficient because the agent pays the token cost only when the information is needed.
Too much static context wastes tokens and dilutes signals. Too little means the agent forgets critical rules. The best systems treat this boundary as a first-class architectural decision, reviewed and versioned like any other configuration.
Agent Skills
The most powerful pattern for managing dynamic context is Agent Skills: structured, portable packages of procedural knowledge that the agent loads only when the task calls for it.
Rather than embedding every piece of specialized knowledge into the system prompt, skills allow the agent to remain a lightweight generalist that flexes into specialist roles on demand through progressive disclosure:
- The agent sees only lightweight metadata at startup
- Loads full instructions when a task matches
- Pulls deep reference material only when explicitly needed
Agent Skills solve four problems that have plagued AI agent development:
- Context rot from overloaded prompts
- Absence of procedural memory for LLMs
- Operational overhead of multi-agent architectures
- Need for portability across tools and vendors
The New Software Development Life Cycle
The Traditional SDLC Under Pressure
The software development life cycle has already been through one major transformation. Over the past two decades, most enterprises moved from sequential waterfall processes to iterative models: Agile sprints, continuous integration, DevOps pipelines, and rapid release cycles.
AI compresses this cycle dramatically, but unevenly: implementation that once took weeks can now be done in hours, while requirements, architecture, and verification remain stubbornly human-paced.
flowchart LR
subgraph Traditional["🕰️ Traditional SDLC"]
direction TB
T1["Requirements\n(weeks)"] --> T2["Design\n(weeks)"] --> T3["Implementation\n(months)"] --> T4["Testing\n(weeks)"] --> T5["Deployment\n(days)"] --> T6["Maintenance\n(ongoing)"]
end
subgraph AI["🚀 AI-Driven SDLC"]
direction TB
A1["Requirements +\nPrototype (hours)"] --> A2["Architecture\n(human-paced)"] --> A3["Implementation\nby AI (hours)"] --> A4["Automated\nTesting (minutes)"] --> A5["AI-monitored\nDeployment"] --> A6["AI-assisted\nMaintenance"]
end
style Traditional fill:#2d1a1a,stroke:#8a4a4a,color:#fff
style AI fill:#1a2a1a,stroke:#4a8a4a,color:#fff
The phase-by-phase picture described here reflects the state of AI-driven SDLC as of mid-2026. Early signs suggest that the compression will spread beyond implementation: teams are already experimenting with workflows where developers go directly from specs to review, with AI agents handling implementation, testing, and deployment in the background.
How AI Transforms Each Phase
Requirements and Planning
Requirements stop being a document handed off between teams. They become a conversation between humans and AI that produces specification and initial implementation simultaneously.
Modern AI tools can:
- Generate user stories from product briefs
- Identify edge cases that humans miss
- Produce API schemas from natural-language descriptions
- Generate interactive prototypes from specification documents
Design and Architecture
Architecture remains the most stubbornly human-centric phase, and for good reason. Architectural decisions are fundamentally about trade-offs: consistency vs. availability, complexity vs. flexibility, build vs. buy. These trade-offs depend on business context, organisational constraints, and long-term strategic considerations that AI cannot fully grasp.
AI excels at implementing architectural decisions once they are made. Given a clear architecture document, AI agents can scaffold entire applications, generate consistent patterns across modules, and ensure that new code conforms to established conventions.
Implementation
Modern coding agents can generate entire features from natural-language descriptions, implement complex algorithms, and produce multi-file changes that work together correctly. Industry surveys report 25 to 39% productivity improvements, with some tasks seeing larger gains.
A study by METR found that experienced developers using AI assistants actually took 19% longer on certain tasks, largely because of time spent verifying, debugging, and correcting AI output. AI does not eliminate implementation work — it transforms it from writing to reviewing, guiding, and verifying.
Testing and Quality Assurance
Testing AI-generated code requires evaluating not just what the agent produced, but how it got there:
- Output evaluation — checks the final artifact: does the code compile, do the tests pass?
- Trajectory evaluation — checks the full sequence of tool calls and intermediate reasoning. A fluent output that skipped its verification steps is a more dangerous failure than one with a visible error.
The continuous quality flywheel:
- Evaluate against a benchmark suite
- Diagnose failures by clustering root causes
- Optimize the prompts or tools that caused them
- Verify fixes against a regression suite
- Monitor production traffic for new failure modes
Code Review and Deployment
AI serves as a first-pass reviewer that can identify potential bugs, style violations, security vulnerabilities, and performance issues before a human reviewer sees the code. Context-dependent decisions about design, maintainability, and strategic alignment still require human judgment.
Deployment pipelines are becoming AI-aware: agents can monitor deployment health, automatically roll back problematic releases, and predict deployment risks based on the nature and scope of changes.
Maintenance and Evolution
Legacy codebases that were once impenetrable to new team members can now be navigated, understood, and modified with AI assistance. AI agents can systematically migrate codebases between frameworks, update deprecated APIs, and modernize test suites — tasks that were previously so tedious and risky that they simply never happened.
The Factory Model
flowchart TD
Dev["👨💻 Developer\n(Factory Manager)"]
subgraph Factory["🏭 The Factory"]
direction TB
Spec["📋 Specifications\n& Context"] --> Agents["🤖 AI Agents\n(Implementation)"]
Agents --> Tests["✅ Tests\n& Quality Gates"]
Tests --> FL["🔄 Feedback\nLoops"]
FL --> Agents
GR["🛡️ Guardrails\n& Constraints"] --> Agents
end
Dev --> Factory
Factory --> Code["📦 Production\nCode"]
style Factory fill:#1a1a2e,stroke:#4a4a8a,color:#fff
style Dev fill:#533483,stroke:#7a5ab5,color:#fff
style Code fill:#1a3d1a,stroke:#4a8a4a,color:#fff
The mental model that ties these transformations together is the factory model. In this model, the developer’s primary output is not code — it’s the system that produces code. This system includes:
- Specifications and context that define what needs to be built
- Agents that translate specifications into implementation
- Tests and quality gates that verify correctness
- Feedback loops that route failures back to agents for correction
- Guardrails that constrain agents to safe, predictable behavior
A factory manager does not assemble every widget by hand. They design the assembly line and ensure quality control. Success comes from giving agents success criteria rather than step-by-step instructions, then letting them iterate.
Harness Engineering: What Surrounds the Model
There is a temptation to treat the model as the system. A new model comes out, the agent gets smarter. That intuition is wrong, and it leads to the wrong investments.
A raw model is not an agent. It becomes one once a harness gives it state, tool execution, feedback loops, and enforceable constraints. The behaviour developers experience when working with Claude Code, Cursor, Codex, Aider, or Cline is dominated by what the harness does, not just by which model is underneath.
\[\text{Agent} = \text{Model} + \text{Harness}\]
flowchart TB
subgraph Harness["🔧 The Harness"]
direction TB
IR["📜 Instructions &\nRule Files\n(AGENTS.md, CLAUDE.md)"]
Tools["🛠️ Tools\n(APIs, MCP Servers)"]
Sandbox["📦 Sandboxes &\nExecution Environments"]
Orch["🔀 Orchestration Logic\n(Sub-agents, routing)"]
GR["🛡️ Guardrails / Hooks\n(Lifecycle checkpoints)"]
Obs["📊 Observability\n(Logs, traces, evals)"]
end
Model["🧠 Model\n(Reasoning Engine)"] --> Harness
Harness --> Agent["🤖 Agent\n(Working System)"]
style Harness fill:#1a1a2e,stroke:#4a4a8a,color:#fff
style Model fill:#533483,stroke:#7a5ab5,color:#fff
style Agent fill:#1a3d1a,stroke:#4a8a4a,color:#fff
What’s in the Harness
- Instructions and Rule Files — the text that defines who the agent is, what it cares about, and what it is forbidden from doing. Includes
AGENTS.md,CLAUDE.md,GEMINI.md, skill files, and sub-agent prompts. - Tools — the functions, MCP servers, and APIs the agent can call, plus the prose that tells the model when and how to call them.
- Sandboxes and execution environments — where the agent’s code actually runs, what it has access to, what it cannot reach.
- Orchestration logic — sub-agent spawning, model routing, hand-offs between specialists, and the rules that govern when each fires.
- Guardrails / Hooks — deterministic code that runs at specific lifecycle points: before a tool call, after a file edit, before a commit. Hooks are the place for things the agent should never forget but often does.
- Observability — logs, traces, evaluations, cost and latency metering. Without observability, there is no way to tell whether the agent is doing well or quietly drifting.
Harness in SDLC
flowchart LR
subgraph P1["Phase 1\nRequirements & Architecture"]
H1["⚙️ Configure\nInstruction files,\ntool access, rules"]
end
subgraph P2["Phase 2\nImplementation"]
H2["▶️ Run\nSandboxes,\nexecution environments,\ntools"]
end
subgraph P3["Phase 3\nTesting & QA"]
H3["🔄 Feedback Loop\nOrchestration logic,\nguardrails → self-correction"]
end
subgraph P4["Phase 4\nReview, Deploy & Maintain"]
H4["👁️ Observe\nHooks, observability,\naudit trails"]
end
P1 --> P2 --> P3 --> P4
style P1 fill:#1a2a3d,stroke:#4a6a8a,color:#fff
style P2 fill:#1a3d1a,stroke:#4a8a4a,color:#fff
style P3 fill:#3d2a1a,stroke:#8a6a4a,color:#fff
style P4 fill:#2d1a3d,stroke:#6a4a8a,color:#fff
When an agent does something wrong, the first instinct is to blame the model. More often, the failure traces back to a missing tool, a vague rule, an absent guardrail, or a context window stuffed with noise. Public benchmarks confirm this: one team moved a coding agent from outside the Top 30 to the Top 5 on Terminal Bench 2.0 by changing only the harness, with no model change at all.
The Developer’s Evolving Role: Conductors and Orchestrators
As AI takes over more of the implementation work, developers move fluidly between two modes:
flowchart LR
subgraph Conductor["🎼 Conductor Mode"]
direction TB
C1["Real-time collaboration\nwith AI pair-programmer"]
C2["In the IDE, watching\ncode appear"]
C3["Fine-grained control\nover every change"]
C4["Tools: Copilot, Cursor,\nWindsurf, Gemini Code Assist"]
end
subgraph Orchestrator["🎭 Orchestrator Mode"]
direction TB
O1["High-level abstraction:\ndefine goals, assign to agents"]
O2["Agents work in background,\nin parallel"]
O3["Review results,\nprovide course corrections"]
O4["Tools: Jules, Copilot Agent\nMode, Claude Code"]
end
Dev["👨💻 Developer"] --> Conductor
Dev --> Orchestrator
style Conductor fill:#1a2a3d,stroke:#4a6a8a,color:#fff
style Orchestrator fill:#1a3d2a,stroke:#4a8a6a,color:#fff
style Dev fill:#533483,stroke:#7a5ab5,color:#fff
The Conductor: Hands-On, Real-Time Direction
In conductor mode, a developer works in real-time with an AI pair-programmer. They’re in the IDE, watching code appear, guiding the AI with prompts and corrections, maintaining fine-grained control over what gets written.
This mode is typical when: - Working on complex logic - Debugging tricky issues - Working in unfamiliar codebases where each change must be understood
The risk is that it can become a bottleneck — if the developer is personally directing every keystroke, the throughput improvement from AI is limited.
The Orchestrator: Async, Multi-Agent Delegation
In orchestrator mode, the developer operates at a higher level of abstraction. They define goals, assign them to agents, and review results — not watching code appear line by line. Agents may be working in the background, in parallel, on different parts of a codebase.
This mode is typical for: - Well-defined bug fixes - Feature implementations against established patterns - Codebase migrations - Test generation
The orchestrator mode requires different skills:
- Specification — defining tasks precisely enough for autonomous execution
- Decomposition — breaking large tasks into appropriately sized units
- Evaluation — quickly assessing whether agent output meets quality standards
- System design — designing the constraints, tests, and feedback loops that keep agents productive
The 80% Problem
A persistent challenge: AI agents can rapidly generate approximately 80% of the code for a feature, but the remaining 20% — edge cases, error handling, integration points, and subtle correctness requirements — demands deep contextual knowledge that current models often lack.
The nature of AI errors has evolved from simple syntax mistakes to more insidious conceptual failures: wrong assumptions about business logic, missing edge cases, and architectural decisions that create subtle long-term maintenance burdens. These errors are harder to detect because the code “looks right” and may even pass basic tests.
The developers who navigate this challenge most effectively use AI for what it’s good at (rapid implementation of well-specified tasks) while reserving their own attention for what AI struggles with (ambiguous requirements, architectural trade-offs, and correctness verification).
Coding Agents in Practice
Where Coding Agents Fit in the Developer’s Day
Coding agents show up in three places in everyday work. Most developers use all three at once:
In the editor: Inline completion, chat panels, whole-codebase awareness. This is where most people first meet AI in coding. Examples: GitHub Copilot, Cursor, Windsurf, JetBrains AI Assistant.
In the terminal: Coding agents launched from the command line, given a goal in plain language, working across the codebase with full file system access, multi-file edits, ability to run tools and tests. Examples: Claude Code, Codex CLI, Cline.
In the background: Agents that take a task and run autonomously in cloud-hosted sandboxes, often for hours, producing a pull request as output. Examples: Google Jules, GitHub Copilot agent mode, Cursor’s background agents.
flowchart TD
subgraph Editor["💻 In the Editor"]
E1["Inline completion\nas you type"]
E2["Chat panels:\nexplain/modify code"]
E3["Whole-codebase\nawareness in IDE"]
end
subgraph Terminal["🖥️ In the Terminal"]
T1["Multi-file edits\nacross codebase"]
T2["Run tests and\nreact to results"]
T3["Explore unfamiliar\ncodebases"]
end
subgraph Background["☁️ In the Background"]
B1["Cloud-hosted sandboxes\n(hours of autonomous work)"]
B2["Well-specified tasks:\nbug fixes, migrations"]
B3["Output: pull request\nfor developer review"]
end
Dev["👨💻 Developer"] --> Editor & Terminal & Background
style Editor fill:#1a2a3d,stroke:#4a6a8a,color:#fff
style Terminal fill:#1a3d1a,stroke:#4a8a4a,color:#fff
style Background fill:#3d1a3d,stroke:#8a4a8a,color:#fff
style Dev fill:#533483,stroke:#7a5ab5,color:#fff
Vibe Coding Production-Ready Agents
The same terminal-based workflow that produces prototype scripts now reaches production agents. Building, evaluating, and deploying a real agent — with persistent memory, governance, and observability — has moved from a framework and cloud console task into something that happens in the same terminal.
Google’s Agents CLI bundles skills for building agents on Google Cloud, covering the full ADK lifecycle: scaffolding, writing, evaluating, deploying, and wiring up observability. After a one-time install, the coding agent gains seven new skills.
# One-time setup
uvx google-agents-cli setup
# Then in your coding agent:
# > Build a support agent that answers questions from our docs.
# > evaluate it on the FAQ dataset
# > Deploy it to Agent EngineBehind that single instruction, the coding agent scaffolds a project from a template, writes the ADK code, generates an evalset, runs it against the agent, deploys to Agent Runtime, and reports back.
Coordination across agents happens through: - Shared session state for simple cases - Model Context Protocol (MCP) for tool access - Agent2Agent (A2A) protocol for cross-agent delegation
The Economics of AI Development
When evaluating AI’s impact on the SDLC, the more critical metric for engineering leaders is Total Cost of Ownership (TCO) — specifically how different workflows shift the financial burden between:
- CapEx — the upfront investment to build something
- OpEx — the ongoing cost to run, fix, and maintain it
quadrantChart
title CapEx vs OpEx by Development Approach
x-axis Low CapEx --> High CapEx
y-axis Low OpEx --> High OpEx
quadrant-1 High Investment, High Ops
quadrant-2 Low Investment, High Ops
quadrant-3 Low Investment, Low Ops
quadrant-4 High Investment, Low Ops
Vibe Coding: [0.15, 0.85]
Structured AI-Assisted: [0.45, 0.5]
Agentic Engineering: [0.8, 0.2]
The Investment of Agentic Engineering (High CapEx, Low OpEx)
Agentic engineering flips this economic model. The CapEx includes designing API schemas, building deterministic test suites, and structuring the agent’s context. While higher upfront, the marginal cost of shipping and maintaining a feature drops dramatically.
Context Engineering as a Financial Lever
In the token economy, context engineering is not just a technical skill — it is a financial strategy. Effective context engineering ensures the model receives a dense, high-signal payload rather than a sprawling, noisy one, dramatically increasing the agent’s first-pass success rate.
Intelligent Model Routing
A well-designed factory model avoids expensive waste by routing tasks intelligently:
- Large frontier models for highly complex tasks (Requirements, Architecture, initial Implementation)
- Smaller, faster, cheaper models for lower-complexity tasks (Test Generation, Code Review, CI/CD monitoring)
Where to Start
For Individual Developers
Set up an
AGENTS.mdfor the project. Start with ten lines: stack, conventions, hard rules, workflow. Add a rule every time the agent does something it should not do again.Install a set of skills for your coding agents (like Agents CLI) to build, evaluate, deploy and optimize agents.
Pick one repetitive workflow and make it the first agent. A research workflow, a code review process, a recurring report. Use a coding agent for the prototype, graduate it to a production agent when it earns its keep. Building one agent end to end teaches more than reading about a hundred.
Write the tests and evals before generating the code. Together they are the contract with the AI. A well-written test and eval suite communicates intent more precisely than any natural-language prompt, and turns AI-assisted development from vibe coding into agentic engineering.
Review every line the agent produces that is going to ship. Be skeptical of anything that looks clever. Check imports for real packages. Verify that error handling covers realistic failure modes.
Maintain your developer skills. AI handles the routine so the developer can focus on the challenging. That arrangement only works if foundational skills — debugging, system design, intuition for performance and correctness — stay sharp.
For Engineering Leaders
Make context engineering a first-class engineering practice. Treat
AGENTS.md, system prompts, eval suites, and skill libraries as code: reviewed in pull requests, versioned with the project, owned by named engineers.Set the bar at the eval, not the demo. A working demo proves an agent can succeed once. A passing eval suite proves it succeeds reliably. Define what you are scoring: task success, tool use quality, trajectory compliance, hallucination, and response quality.
Re-shape code review for AI-generated code. Extra attention to hallucinated dependencies, inadequate error handling, and subtle correctness gaps that look right at a glance.
Distinguish prototyping work from production work in team norms. Vibe coding is the right speed for exploration. Agentic engineering is the right discipline for production. Make the boundary explicit.
Invest in harness components as a shared team asset. Reusable system prompts, skill libraries, MCP server connections, and evaluation harnesses compound across projects. Treat them as infrastructure.
For Organizations
Treat AI-assisted development as an engineering investment, not a productivity feature. Rolling out a coding agent without eval coverage, observability, and clear architectural standards produces speed without quality.
Invest in the production substrate before scale. What graduates a vibe-coded prototype to production is operations discipline: trajectory and final-response evals in CI, traces of every agent run, scoped permissions, and security review tuned to generated code’s failure modes.
Adopt open standards. Model Context Protocol (MCP) for tool access and Agent2Agent (A2A) for cross-agent delegation are converging into the connective tissue of multi-agent systems.
Plan for hybrid teams of humans and agents. The strongest production results come from architectures where humans set direction, agents do the implementation, and clear handoff protocols govern the boundary.
Reframe hiring and skill development around judgment, not just implementation. The most valuable engineers in the next several years will be the ones who can direct agents well, not the ones who can write the most code.
Conclusion: Intent as the New Interface
The transition from syntax to intent is not a future prediction — it’s a present reality. Three principles stand out as durable:
Vibe coding is valid for exploration, prototyping, and personal projects. But for software that organizations depend on, the discipline of agentic engineering — specifications, tests, guardrails, and human oversight of architecture — is not optional.
Organizations with strong testing practices, clear architectural standards, and healthy code review processes get dramatically more value from AI-assisted development than those without. AI is a force multiplier — it multiplies both your strengths and your weaknesses.
The builders who understand architecture, can define precise specifications, evaluate output critically, and design effective systems of constraints and feedback loops are more valuable than ever. The skills that matter are shifting from implementation to judgment, from writing code to designing the systems that produce code.
Generation is solved. Verification, judgment, and direction are the new craft.
Endnotes
- GetPanto, “AI Coding Assistant Statistics 2025-2026,” https://www.getpanto.ai/blog/ai-coding-assistant-statistics
- Karpathy, A., “Vibe Coding,” X/Twitter post, February 2025. https://x.com/karpathy/status/1886192184808149383
- Osmani, A., “Agentic Engineering,” https://addyosmani.com/blog/agentic-engineering/
- Karpathy, A., “From Vibe Coding to Agentic Engineering,” 2026; The New Stack, https://thenewstack.io/vibe-coding-is-passe/
- Glide Blog, “What is Agentic Engineering?” https://www.glideapps.com/blog/what-is-agentic-engineering
- CircleCI, “AI-Native SDLC,” https://circleci.com/blog/ai-sdlc/
- GroovyWeb, “SDLC in the AI Era,” https://www.groovyweb.co/blog/sdlc-ai-era-software-development-2026
- Osmani, A., “The Factory Model,” https://addyosmani.com/blog/factory-model/
- Deloitte, “AI in Software Engineering: Productivity Gains 2025-2026”
- METR, “Uplift Update: Measuring the Impact of AI Coding Tools,” February 2026. https://metr.org/blog/2026-02-24-uplift-update/
- Google, “Introduction to Agents,” Agents Whitepaper Series, November 2025
- Osmani, A., “From Conductors to Orchestrators,” https://addyosmani.com/blog/future-agentic-coding/
- Google, “Jules: AI-Powered Coding Agent,” https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/
- Osmani, A., “The 80% Problem in Agentic Coding,” https://addyo.substack.com/p/the-80-problem-in-agentic-coding
- Google, “Agent Development Kit (ADK),” https://google.github.io/adk-docs/
- Google, “Agent-to-Agent (A2A) Protocol,” https://google.github.io/a2a-protocol/