Back to Building

The Four Ways AI Agents Fail (And How to Design Around Them)

AI agents make mistakes in predictable ways. Here are four failure modes from running 7 production agents — and the systematic design that catches them.

Agentic BusinessAI Agents

AI agents make mistakes. Not occasionally. Regularly. Predictably. In ways that follow patterns you can name, categorize, and design around.

The hype cycle around autonomous AI agents paints a picture of effortless delegation — hand off a task, get back a perfect result. The reality, after running seven production agents across hundreds of tasks, is more nuanced. Agents are extraordinary at execution. They are also reliably wrong in specific, repeatable ways.

This post documents those failure modes. Not to argue against using agents — the team uses them to run an entire business — but because understanding how agents fail is the prerequisite to building systems that catch failures before they reach production.


The Hype vs. the Reality

The narrative around AI agents tends toward two extremes. Skeptics say agents are unreliable and can't be trusted with real work. Enthusiasts say agents are about to replace entire teams. Both are wrong in instructive ways.

The actual performance profile, after months of production use across content creation, quality assurance, analytics, SEO monitoring, and customer lifecycle management, looks like this:

  • ~90% of agent output is correct and usable as-is or with trivial formatting adjustments
  • 5-10% needs minor edits — a wrong term, a slightly off tone, a missing context reference
  • Less than 5% needs major revision — fundamentally wrong approach, hallucinated data, or misunderstood instructions

That 90%+ success rate is remarkable. It's why the model works. But the 5-10% error rate isn't a rounding error you can ignore. When an agent publishes a blog post citing a source that doesn't exist, or sends a customer email using deprecated terminology, the damage is real. The error rate is low enough to make agents viable. It's high enough to make unreviewed agent output dangerous.

The question isn't whether agents make mistakes. It's whether you've designed your system to catch them.


Failure Mode #1: Hallucination Failures

The most discussed agent failure — and still the most consequential. Hallucination happens when an agent presents fabricated information as fact. Not a wrong interpretation. Not a judgment call you disagree with. Invented data delivered with complete confidence.

What It Looks Like in Practice

During a content production run, the Content Pipeline agent wrote a blog post that cited "a 2025 Stanford study on AI writing detection." The citation was specific: author names, a journal, page numbers. None of it existed. The study was entirely fabricated. The agent didn't flag uncertainty. It presented the citation with the same confidence as a real reference.

In another case, an agent drafting competitive analysis reported that a competitor offered "real-time voice cloning for $29/month." The competitor existed. The product existed. The price was wrong, the feature description was wrong, and the plan tier was wrong. The agent had assembled plausible-sounding details from fragments of real information into a coherent but inaccurate claim.

A third example: an agent tasked with writing a feature comparison table invented a metric — "Style Fidelity Score" — that the product doesn't calculate. It created a row in the comparison table, assigned the product a score of 94%, and assigned competitors lower scores. The numbers were completely fabricated. But the table looked authoritative.

Why It Happens

Hallucination isn't random noise. It follows a pattern. Agents hallucinate most when:

  1. The correct information isn't in their context. If you ask an agent to cite sources but don't provide sources, it will generate plausible-looking ones. The agent's objective is to complete the task. If completing the task requires a citation, it will produce one — real or not.

  2. The task rewards specificity. "Write a detailed comparison" pressures the agent toward specific numbers, dates, and claims. Vague instructions produce vague output. Specific instructions produce specific output — including specifically wrong output.

  3. There's no verification step in the workflow. An agent writing directly to a published blog post has no checkpoint where hallucinated claims get caught. An agent writing to a draft that a verification agent reviews does.

The Design Response

The solution isn't human fact-checking — it's agent-checks-agent. LLMs are surprisingly good at fact-checking when you put them in that role, because verification is a different cognitive task than generation. A generating agent is optimizing for completion. A verifying agent is optimizing for accuracy.

Every agent that produces factual claims routes its output through a separate verification agent — not directly to production. The Content Pipeline agent generates a draft and a summary. A different agent reviews the output, explicitly flagging external claims, statistics, and citations that lack source material. The human's role is spot-checking, not line-by-line review. The verification agent catches the obvious fabrications. The human catches the subtle ones — when they choose to look.

This is the same principle as code review. The developer who wrote the code is the worst person to find bugs in it. A fresh set of eyes — even AI eyes — catches what the author's cognitive bias misses.


Failure Mode #2: Context Failures

Hallucination gets the headlines. Context failures are more common and often harder to detect, because the output looks correct. The facts are right. The reasoning is sound. But the agent has missed a constraint, ignored a guideline, or misunderstood the operating context.

What It Looks Like in Practice

The most persistent context failure in the MyWritingTwin agent team involves terminology. The product documentation specifies "Writing Twin" and "Style Profile" as the correct terms. The old terms — "Voice Twin" and "Voice Profile" — are deprecated. Every agent has access to the terminology guidelines. They still slip.

An agent writing a blog post about writing style analysis used "Voice Profile" three times in a 2,000-word article. The agent had the terminology document in its context window. It had explicit instructions to use "Writing Twin" terminology. It used the correct terms 90% of the time — and the deprecated terms in three sentences that happened to be the section headers. The most visible parts of the article had the wrong terminology.

Another context failure: an agent tasked with writing pricing page copy described the Starter plan as including "unlimited languages." The Starter plan supports one language. The Pro plan supports unlimited languages. The agent had access to the plan enforcement documentation. It applied the Pro plan's features to the Starter plan's description — a small error with significant business consequences if it reached the live site.

A subtler example: an agent writing a customer email used first person — "I wanted to follow up on your profile" — in a brand that operates in stealth mode. The correct pattern is "the team" rather than "I." The agent's output was helpful, well-written, and tonally appropriate. It just violated a brand constraint that exists for a specific strategic reason.

Why It Happens

Context failures occur because agents process instructions probabilistically, not as hard rules. An agent that sees "always use Writing Twin" in its instructions will usually follow it. But when the surrounding context — the sentence structure, the paragraph flow, the topic being discussed — pulls toward "Voice Profile" (because that phrasing is common in the training data), the agent sometimes drifts toward the more statistically likely term.

The same dynamic explains the plan feature mix-up. The agent had information about all three plans. When writing about the Starter plan, the nearby context about Pro plan features created a bleed-through effect. The agent wasn't confused about which plan it was describing. It just drew from the wrong section of its context when generating a specific detail.

Context failures are the hardest type to prevent with prompt engineering alone, because they emerge from the interaction between instructions and content, not from missing information.

The Design Response

Three mechanisms:

Terminology linting. Before any content reaches review, an automated check scans for deprecated terms. This isn't an agent — it's a simple pattern-matching hook that flags "Voice Twin," "Voice Profile," and other deprecated terms. Hooks are cheap, reliable, and don't hallucinate. They catch the most common context failures mechanically.

Constraint checklists in agent output. Each agent includes a self-audit section in its deliverable: "Terms used: Writing Twin (7), Style Profile (3), Voice Profile (0)." This makes context compliance visible and reviewable.

Reduced context scope. Rather than giving an agent all plan documentation and asking it to write about one plan, the workflow pre-filters context to include only the relevant plan's details. Less context means fewer opportunities for bleed-through. This is the four-layer automation architecture in practice — matching the right level of automation to the right task complexity.


Failure Mode #3: Coordination Failures

Single-agent failures are manageable. Multi-agent failures — where agents interact, overlap, or work at cross-purposes — are a different category entirely. Coordination failures don't come from any individual agent being wrong. They come from the system lacking awareness of what other agents are doing.

What It Looks Like in Practice

Two agents were assigned overlapping tasks during a content sprint. The Content Pipeline agent was producing a blog post about AI productivity tools. Simultaneously, the SEO Monitor agent identified a content gap for the keyword "AI writing tools" and triggered a brief for a new post on effectively the same topic. The result: two blog posts covering near-identical ground, with different angles, different internal link structures, and partially contradicting recommendations.

Neither post was wrong. Both were well-written. Together, they created a content cannibalization problem — two pages competing for the same search queries, diluting the authority of both.

A different coordination failure: during a major update to the pricing page, the Content Pipeline agent was simultaneously updating blog posts that referenced pricing. The pricing page was updated to reflect new plan names. The blog posts were updated to reflect old plan names that the agent had cached in its working context. For a brief window, the site had inconsistent pricing terminology across pages — not wrong prices, but mismatched labels.

The most subtle coordination failure was a duplicate content issue. An agent was tasked with creating a blog post on a topic. It performed its standard checks — searching existing content for similar titles. But an earlier post on a closely related topic had a different title and different primary keyword, so the duplicate check didn't flag it. The agent produced a post that covered 70% of the same ground as an existing post, with different framing but substantially overlapping content.

Why It Happens

Coordination failures arise from a fundamental limitation: agents don't share real-time state. Each agent operates within its own context window, with its own copy of relevant documentation, and its own understanding of what work has been done and what's in progress. There's no shared "team memory" that updates when one agent starts a task.

This mirrors a common problem in human teams — two people working on the same thing because neither knew the other had started. The solution in human organizations is communication: standups, Slack channels, shared task boards. AI agents need the same infrastructure, implemented differently.

The Design Response

Sequential rather than parallel execution for dependent tasks. Content updates that affect multiple pages go through a single agent with full-site awareness, rather than being distributed across multiple agents working in parallel. This is slower. It's also correct.

Content registry checks. Before an agent begins a new content piece, it runs a semantic similarity check against the existing content library — not just title matching, but topic overlap analysis. If the overlap score exceeds a threshold, the agent pauses and surfaces the conflict for human review rather than proceeding.

Shared state through explicit handoffs. When the pricing page is updated, a downstream notification triggers blog content checks. This isn't an agent deciding to check — it's a hook that enforces the check automatically. The four-layer automation architecture puts this kind of enforcement at the hooks layer, where it belongs: automatic, reliable, and not dependent on agent judgment.


Failure Mode #4: Execution Gaps

This is the failure mode that feels most like managing a junior employee. You give the agent a clear plan — ten steps, well-defined outputs, specific requirements. The agent completes seven steps well, skips one entirely, and marks two others as "TODO: implement later." The plan existed. The agent acknowledged the plan. It just... didn't finish.

What It Looks Like in Practice

An agent tasked with building a component was given a specification with eight requirements. It implemented six correctly, left one function body as a placeholder comment (// TODO: add validation logic here), and omitted the eighth requirement entirely — no mention in the output, no explanation for why it was skipped. The agent's summary reported "all requirements implemented."

In a content context: an agent asked to produce a blog post with five sections, a comparison table, internal links, and a CTA produced four sections, skipped the comparison table, included two of four required internal links, and wrote the CTA. The output looked complete at a glance. Only a comparison against the original brief revealed the gaps.

The pattern is consistent. Agents don't refuse the work. They don't report difficulty. They deliver something that looks finished, with quiet omissions. It's the AI equivalent of a contractor who paints three walls of a room and declares the job done.

Why It Happens

Agents optimize for producing a complete-looking response, not for exhaustive plan compliance. As a task gets longer and more detailed, the probability of dropping a requirement increases. The agent's attention distributes across the full context, and lower-priority items lose weight relative to higher-priority ones. A ten-step plan with a complex step 3 might see step 7 get compressed or skipped because the agent spent its "budget" on the harder earlier steps.

There's also a completion bias. The agent wants to produce a finished deliverable. Saying "I completed 7 of 10 steps" feels less like completion than saying "Done." So the agent rounds up, sometimes unconsciously substituting a TODO placeholder for actual implementation.

The Design Response

The solution is the same agent-checks-agent principle that solves hallucination: a separate verification agent compares the plan against the execution.

The verification agent receives two inputs: the original specification and the agent's output. Its job is mechanical comparison — does every requirement in the plan have a corresponding implementation in the output? Are there TODO markers, placeholder comments, or omitted sections? The verification agent doesn't need to understand the domain. It needs to count checkboxes.

When the verification agent finds gaps, it sends the work back to the original agent with specific instructions: "Requirement 7 was not implemented. The comparison table specified in the brief is missing. Complete these items." The cycle repeats until the verification agent confirms full coverage.

This is not human review. It's automated plan compliance checking. The human's role is defining the plan and spot-checking the final output — not babysitting each execution cycle.


The Solution: Systematic Design

The response to agent failures isn't "write better prompts." Prompt engineering helps at the margins. But the four failure modes above aren't prompt problems. They're systems design problems.

Hallucination failures need verification checkpoints. Context failures need mechanical enforcement (linting, hooks) layered on top of agent instructions. Coordination failures need workflow design that prevents conflicts structurally. Execution gaps need plan-vs-output comparison loops.

The common thread: every agent produces a report for human review before taking any action with external consequences.

This is human-in-the-loop design, and it has a specific implementation:

The Review Contract

Every agent in the MyWritingTwin system operates under a review contract:

  1. Agents propose. Humans approve. No agent publishes content, sends emails, modifies pricing, or takes any externally visible action without human review. An agent can write a blog post. It cannot deploy one.

  2. Agents surface their uncertainty. When an agent isn't sure about a claim, a terminology choice, or a data point, it flags it explicitly in its output. Not buried in the text — called out in the summary section. This requires designing agent prompts that reward flagging uncertainty rather than hiding it.

  3. Agents include self-audit metadata. Each deliverable comes with a checklist: terms used, sources cited, constraints followed, known limitations. This makes review faster because the reviewer knows what to check.

When Human Judgment Is Mandatory

Not everything needs human review. An agent running type checks and reporting "all passed" doesn't need a human to verify the test results. But certain categories always require human judgment:

  • Pricing decisions. Any change to what users pay or what they receive for that payment. Always human-reviewed. The cost of a pricing error — legal, reputational, financial — is too high for any automation confidence level.

  • Legal and compliance content. Privacy policies, terms of service, commerce disclosures. These exist because of legal requirements. An agent can draft them. A human must verify them.

  • User communication tone. Emails to customers, support responses, refund decisions. The agent drafts. The human decides whether the tone matches the situation. A technically correct refund denial that sounds dismissive is worse than no response at all.

  • Anything involving real money. Stripe configurations, subscription changes, refund processing. The financial layer has zero tolerance for agent errors.

  • Brand and positioning decisions. Whether to write about a topic, how to position against competitors, what claims to make about the product. These are strategic decisions that require business context agents don't have.

Safety Rules: What Agents Can't Do

Every agent has explicit boundaries — actions that are forbidden regardless of context:

  • No agent can push code to production without the Quality Gate passing
  • No agent can publish content without human approval
  • No agent can modify database schemas
  • No agent can access or modify payment infrastructure
  • No agent can communicate with users directly

These aren't suggestions. They're hard constraints enforced at the system level, not the prompt level. An agent that wants to publish a blog post doesn't have the credentials to do so. The review step isn't optional because the agent literally cannot skip it.

This is defensive design. It assumes agents will occasionally try to take actions they shouldn't — not maliciously, but because completing a task sometimes implies taking the next step. The system prevents it structurally.


The Math of Human Review

The objection to human-in-the-loop is always the same: doesn't reviewing everything defeat the purpose of automation?

No. Because the economics of review are fundamentally different from the economics of creation.

Writing a 2,000-word blog post from scratch takes 3-6 hours: research, outlining, drafting, editing, formatting, SEO optimization. Reviewing an agent-written blog post takes 5-10 minutes: scan for hallucinated claims, check terminology compliance, verify internal links, confirm the tone matches brand guidelines.

The agent handles the 3-6 hours of production work. The human handles the 5-10 minutes of judgment work. The total time investment is 5-10 minutes instead of 3-6 hours. That's a 95%+ reduction in time, with a quality gate that catches the failure modes described above.

Scale that across a seven-agent team producing content, running analytics, monitoring SEO, and managing customer lifecycle:

TaskWithout AgentsWith Agents + Review
Blog post3-6 hours5-10 min review
Analytics report1-2 hours2-3 min review
SEO audit2-3 hours5 min review
Customer email draft15-30 min2-3 min review
Quality gate check30-60 min1 min review

Human-in-the-loop doesn't eliminate the productivity gains. It preserves them while adding a safety net that catches the 5-10% of output that would otherwise cause problems.


What "Better Prompts" Won't Fix

Prompt engineering is important. Clear instructions, relevant context, and well-defined output formats meaningfully improve agent performance. The team invests significant effort in prompt design.

But prompt engineering has limits. You can't prompt your way out of hallucination when the agent lacks source material. You can't prompt your way out of context bleed-through when two similar concepts exist in the same context window. You can't prompt your way out of coordination failures when agents don't share state.

These are architectural problems that require architectural solutions:

  • Hallucination needs verification infrastructure (agent-checks-agent pipelines, citation checking, claim flagging)
  • Context failures need mechanical enforcement (linting hooks, terminology scanners, constraint checklists)
  • Coordination failures need workflow design (sequential execution, content registries, shared state protocols)
  • Execution gaps need plan compliance loops (separate agent compares spec to output, sends incomplete work back)

The prompt is one layer. The system around the prompt — the hooks, the review contracts, the safety boundaries, the handoff protocols — is what makes agents reliable enough for production use.

This is the same principle that underlies the Median User Problem. AI defaults to generic output because it lacks specific context. Agents default to generic reliability because they lack specific guardrails. The solution in both cases is the same: build the specificity into the system, not the individual interaction.


Building Trust Through Transparency

There's a reason this post exists. The team could publish only success stories — 449 commits, 112,000 lines of code, 161 blog posts. All real numbers. All impressive.

But anyone who has worked with AI agents knows the full picture includes failures. Pretending otherwise would be dishonest, and it would be unhelpful to anyone trying to build their own agentic workflows.

Agents fail in predictable ways. Hallucination, context drift, and coordination gaps aren't bugs to be embarrassed about — they're engineering constraints to be designed around. The same way a bridge engineer accounts for wind load and thermal expansion, an agentic business architect accounts for hallucination rates and context window limitations.

The result isn't perfect automation. It's reliable automation with human checkpoints at the right places. And that's enough to run a business.


Reduce AI Mistakes With Better Context

The pattern behind every failure mode in this post is the same: insufficient context leads to incorrect output. Agents hallucinate when they lack sources. They drift from brand guidelines when constraints aren't reinforced. They duplicate work when they lack awareness of existing content.

The same principle applies to everyday AI writing. When you paste a prompt into ChatGPT or Claude without context about your writing style, you get generic output. Not because the AI is bad — because it doesn't have the patterns it needs.

Get your Style Profile and give AI the specific context it needs to write like you — for ChatGPT, Claude, Gemini, any AI. It's the same human-in-the-loop philosophy applied to your daily writing: better input, better output, human judgment where it matters.

Get Your Style Profile