Beyond the Vibe: The Rise of Agentic Engineering

AI has made it dangerously easy to confuse working code with finished software.

That is the real tension behind vibe coding. You describe a feature, let an AI tool generate the code, paste back the errors, and keep going until the application appears to work. For prototypes, this can feel like magic. The distance between idea and execution has collapsed.

But production software is not judged by whether it works once in a demo. It is judged by whether it keeps working under real users, real data, real constraints, and real consequences.

That is where vibe coding starts to break down.

In The New SDLC with Vibe Coding, Addy Osmani, Shubham Saboo, and Sokratis Kartakis explore how AI agents are changing the software development lifecycle. Their most important point is not simply that AI will write more code. It is that teams need a new discipline for building software with AI.

That discipline is agentic engineering.

Agentic engineering is the shift from casually generating code to deliberately designing the system around the AI. It is about giving agents the specifications, context, tools, permissions, tests, evaluations, and human oversight they need to produce dependable software.

The future of software development is not “AI writes the code and humans relax.”

The future is humans designing better systems for humans and agents to build software together.

Key takeaways

Agentic engineering is the disciplined use of AI agents in software development.
Vibe coding is useful for prototypes, but it is not enough for production software.
The biggest difference between vibe coding and agentic engineering is verification.
AI-generated code can look correct while hiding subtle risks.
The model is only one part of the system. The harness around the model is just as important.
Context engineering is replacing prompt engineering because agents need project knowledge, not just clever instructions.
Developers will spend more time on specification, architecture, evaluation, and review.
Engineering leaders should treat prompts, evals, agent instructions, and workflows as shared infrastructure.
The future of software development is not less engineering. It is more disciplined engineering around AI.

What is the problem with vibe coding?

Vibe coding is not the problem when the goal is exploration.

It is useful when you want to test an idea, build a throwaway prototype, create a personal tool, or move quickly during a hackathon. It lowers the cost of starting. It helps people build things they might never have attempted before.

The problem begins when teams mistake that speed for engineering maturity.

Vibe coding usually works like this: the developer gives the AI a broad instruction, accepts a large amount of generated code, runs the application, pastes back the errors, and repeats until the visible problem disappears.

That process can produce something impressive very quickly. It can also produce software no one fully understands.

The danger is not that the code always fails. The danger is that it often fails quietly. The app loads. The feature seems to work. The demo looks good. But underneath, there may be missing edge cases, weak error handling, poor dependency choices, security gaps, or business logic that does not match the actual requirement.

Vibe coding gets you to “it appears to work.”

Production engineering has to get you to “we know why it works, we know where it might fail, and we have systems to catch it.”

That is a much higher bar.

What is agentic engineering?

Agentic engineering is a structured way of building software with AI agents.

It does not mean banning AI-generated code. It means refusing to treat code generation as the whole job.

In agentic engineering, the AI works inside a controlled engineering environment. That environment includes clear requirements, architecture rules, coding standards, security constraints, approved tools, automated tests, evaluations, and human review.

The distinction is simple:

Vibe coding focuses on generating code. Agentic engineering focuses on generating dependable software.

That difference matters because software quality does not come from code alone. It comes from the system that shapes, tests, reviews, and maintains that code.

A strong agentic engineering workflow asks:

Can the agent understand the task clearly?
Can it access the right project context?
Can it use the right tools safely?
Can we verify the result?
Can a human review the risky parts efficiently?
Can this process be repeated by the team, not just improvised by one developer?

Those questions are what separate a useful AI experiment from a reliable engineering practice.

Why is verification now more important than generation?

AI has made generation cheap. Verification is where the real engineering work moves.

This is the uncomfortable truth for software teams: AI-generated code often looks more confident than it deserves to be. It can be syntactically correct, stylistically convincing, and still wrong in ways that matter.

The errors are not always obvious. They are often hidden in assumptions.

The AI may assume the wrong data shape. It may handle the happy path but miss failure states. It may choose a dependency the team would not approve. It may implement a feature that sounds right but does not match the business rule. It may skip security checks because the prompt did not make them explicit.

That is why verification becomes the core skill of AI-assisted development.

Traditional tests still matter. Unit tests, integration tests, end-to-end tests, type checks, linting, and security scans are all part of the foundation.

But agentic systems also need evaluations, or evals.

Evals answer broader questions that ordinary tests may not capture:

Did the agent follow the required process?
Did it use the correct tools?
Did it stay within its permissions?
Did it respect the architecture?
Did the result meet the expected quality standard?
Was the final answer reached through a safe and reliable path?

That last question is especially important. In agentic engineering, teams may need to evaluate the agent’s trajectory, not only its final output. An agent can produce the right result in the wrong way. It might bypass a required check, access the wrong data, use an unsafe tool, or make an architectural change that creates future risk.

In production software, that still counts as failure. The lesson is blunt: without verification, AI gives you speed without trust.

Why is the model only one part of the system?

A better model helps, but it does not solve everything.

One of the most useful ideas from the paper is this formula:

Agent = Model + Harness

The model is the part everyone talks about. It reasons, writes, explains, generates, and suggests.

The harness is everything around it.

That includes project instructions, tool access, execution environments, memory, permissions, guardrails, monitoring, tests, evals, and review workflows.

This matters because many AI failures are not really model failures. They are system failures.

The model may not have been given the right context. The task may have been vague. The tool may have been poorly designed. The agent may have had too much permission. The codebase rules may have been missing. The tests may not have covered the risky behaviour.

In those cases, switching to a more powerful model may improve the answer, but it will not fix the process.

Organisations need to stop treating model selection as their entire AI strategy.

The real advantage will come from the harness: the engineering system that makes agents useful, safe, and repeatable.

A mediocre harness can make a strong model unreliable. A strong harness can make an AI agent dramatically more useful.

What is context engineering, and why does it matter?

Prompt engineering asks, “How should we phrase the instruction?”

Context engineering asks, “What does the agent need to know to do this safely?”

That is a much better question.

A capable human engineer does not start work with a single sentence of instruction. They need to understand the product, the architecture, the coding standards, the deployment environment, the security model, the team’s preferences, and the definition of done.

AI agents need the same kind of project knowledge.

For software teams, useful context may include:

Architecture decisions.
Coding conventions.
Product requirements.
Security restrictions.
Approved implementation patterns.
Testing expectations.
Available internal APIs.
Dependency rules.
Deployment constraints.
Examples of good code.
Examples of things to avoid.

The goal is not to stuff every possible document into the prompt. More context is not automatically better. Too much context can increase cost, slow the system down, and bury the instructions that matter most. An agent does not need the entire company wiki to fix a small bug. It needs the right information at the right moment.

Strong agentic systems provide core rules consistently and retrieve specialised context only when the task requires it. That turns context from a one-off prompt trick into an engineering asset.

It should be documented, structured, versioned, reviewed, and maintained like any other important part of the development system.

How does agentic engineering change the developer’s role?

AI does not remove software development work. It changes where human effort matters most.

When implementation becomes faster, the bottleneck moves upstream and downstream.

Upstream, developers need to define the work more clearly. Vague tickets produce vague AI output. Poor requirements become bad software faster than before.

Downstream, developers need to verify more carefully. AI can generate large amounts of code quickly, but someone still needs to decide whether that code belongs in the system.

This means developers increasingly work in two modes.

The first is conductor mode.

In conductor mode, the developer works closely with the AI in real time. They guide it, challenge it, redirect it, and inspect the output as it evolves. This is useful for complex logic, unfamiliar systems, architectural trade-offs, and difficult debugging.

The second is orchestrator mode.

In orchestrator mode, the developer defines a task, hands it to an agent, and reviews the result later. This works best for well-specified tasks such as generating tests, fixing a known defect, updating documentation, migrating a framework, or applying a repeated pattern across a codebase.

Orchestrator mode sounds easy, but it requires discipline.

The developer needs to define the task clearly, set constraints, explain success criteria, limit permissions, and review the result efficiently. Without those skills, delegation to AI becomes a faster way to create hidden problems.

As code generation becomes easier, judgment becomes more valuable.

The strongest engineers will not simply be the ones who type the most code. They will be the ones who can design reliable workflows where humans and agents produce better software together.

What is the 80% problem in AI-generated software?

AI is very good at producing the first 80% of a feature.

That is exactly why it can be risky.

The first 80% creates momentum. The interface appears. The endpoint responds. The test data works. The demo looks convincing. Everyone feels close to done.

But the final 20% is where production software lives.

That last part includes edge cases, integrations, permissions, performance, security, observability, failure handling, data migrations, business rules, and architectural consequences.

It is often the hardest part to see and the easiest part to underestimate.

This is the 80% problem: AI makes incomplete work look more complete than it really is.

The right response is not to avoid AI. The right response is to use AI where it is strongest and place human attention where it matters most.

Use AI aggressively for well-defined implementation work.

Use human judgment for ambiguity, architecture, security, trade-offs, and final correctness.

The goal is not to accept more AI-generated code. The goal is to concentrate engineering expertise where it has the highest impact.

What should developers do now?

Developers should start by making their projects easier for agents to understand.

A practical first step is to create an AGENTS.md file or equivalent project instruction file.

This should explain the stack, conventions, workflow, testing expectations, security rules, architectural boundaries, and anything the AI should never do.

It should answer the questions a capable engineer would ask before safely contributing to the project.

Developers should also write tests before asking AI to generate critical functionality. This changes the workflow from “generate and hope” to “generate against a known standard.”

Every AI-generated line that reaches production should still be reviewed. The review should be especially careful around dependencies, authentication, authorisation, data access, error handling, integrations, and business-critical logic.

The point is not to slow everything down. It is to stop speed from turning into unowned risk.

AI can help write the code. It should not be allowed to silently define the quality bar.

What should engineering leaders do now?

Engineering leaders need to treat agentic engineering as infrastructure, not as individual productivity theatre.

Prompts, project instructions, reusable agent skills, eval suites, coding rules, and review workflows should be version-controlled and maintained. They should have owners. They should improve over time.

If every developer is inventing their own AI workflow in isolation, the organisation is not practising agentic engineering. It is practising distributed experimentation.

That may be fine at the start. It is not enough for production.

Leaders also need to draw a hard line between prototyping and production.

Vibe coding is a good way to explore. It is a bad way to accidentally ship.

A prototype created with AI should not drift into production simply because it appears to work. Before it reaches customers, it needs the same discipline as any other production system: requirements, tests, security review, maintainability checks, and operational readiness.

The quality bar should be repeatable. It should not depend on how impressive the demo looked.

Why does agentic engineering matter?

Agentic engineering matters because AI has changed the economics of software development.

The cost of generating code has fallen dramatically. The cost of understanding, verifying, securing, and maintaining software has not disappeared.

In some cases, it has increased. Teams can now create more code than they can responsibly review. They can build prototypes faster than they can validate the assumptions behind them.
They can generate features before they have clarified the requirements.

That is powerful, but it is also dangerous. Agentic engineering is the response to that danger.

It gives teams a way to use AI without abandoning engineering discipline. It recognises that the value is not only in the model, but in the system around the model.

Vibe coding showed how quickly AI can produce something that works. Agentic engineering is how organisations make sure it still works when the stakes are real.

FAQs

What does agentic engineering mean?

Agentic engineering means building software with AI agents inside a structured engineering system. That system includes requirements, context, tools, permissions, tests, evaluations, guardrails, and human review.

How is agentic engineering different from vibe coding?

Vibe coding is informal and focused on quickly generating working code. Agentic engineering is structured and focused on producing reliable, secure, maintainable software.

Is vibe coding bad?

No. Vibe coding is useful for experiments, prototypes, personal projects, and early exploration. It becomes risky when teams use it as the process for production software.

Why is verification important for AI-generated code?

Verification is important because AI-generated code can look convincing while still containing hidden errors, weak assumptions, security issues, or incorrect business logic.

What are evals in agentic engineering?

Evals are checks that assess the quality of an AI agent’s work. They can evaluate the final output, the process the agent followed, the tools it used, and whether it stayed within the required constraints.

What does “Agent = Model + Harness” mean?

It means an AI agent is not just the model. The harness includes the surrounding instructions, tools, permissions, context, tests, monitoring, and guardrails that make the model useful and safe.

What is context engineering?

Context engineering is the practice of giving AI agents the right project knowledge before they perform a task. This can include architecture rules, coding standards, product requirements, security restrictions, and testing expectations.

Why is prompt engineering not enough?

Prompt engineering focuses on wording. Agentic engineering requires more than wording. It requires the right context, tools, constraints, tests, and review process.

What is the developer’s role in agentic engineering?

The developer’s role shifts towards defining work clearly, designing systems, setting constraints, reviewing output, writing tests, and judging whether AI-generated code is suitable for production.

What should teams do before shipping AI-generated code?

Teams should define requirements, write tests, run evaluations, review the code, check dependencies, verify security-sensitive behaviour, and confirm that the implementation fits the architecture.

Why does agentic engineering matter for production software?

It matters because production software must be reliable, secure, maintainable, and aligned with real business requirements. AI can generate code quickly, but agentic engineering ensures that code can be trusted.

Have a project in mind? Let’s get to work.

Let’s chat about how we can help you. Fill in the details and we’ll get back to you as soon we can.

SERVICES

OTHER SERVICES

AI strategy and consulting

AI Solution Design

LLM and Generative AI