Dropping the Ball Is the Point: What Juggling Taught Me About Failure in AI Systems

There’s a moment in learning to juggle that nobody warns you about.

You’ve been practicing for a week. Three balls, finally staying in the air for five, six, seven throws in a row. And then you get confident. You stop thinking so hard. You relax.

And you drop everything.

All three balls. On the floor. Simultaneously. With maximum embarrassment if anyone was watching.

Every new juggler experiences this exact moment and interprets it as failure. As evidence that they’re not good enough, or that they lost focus at the wrong moment, or that they should have kept concentrating instead of getting comfortable.

Here’s what an experienced juggler knows: that’s not failure. That’s the practice.

Dropping is how you find your limits. Dropping is feedback. Dropping tells you exactly where the edge of your current skill level is, which is information you cannot get any other way. You can’t learn it from watching. You can’t reason your way to it. You have to run into it.

I spent a lot of years in IT before I understood that software systems work the same way. And I spent even longer before I understood it about AI agents.

3 types

Drop categories

Reach, momentum, environmental

4 principles

Controlled dropping

Design for recovery, not avoidance

15 min

Otto sync lag (before)

Now syncs within 15 min of event change

upstream

Most agent failures

Are actually system failures, not agent failures

The Drop I Didn’t Recognise As a Drop

Here is the thing about dropping: it comes in forms you don’t always recognise.

I never stopped anything. I never abandoned a project, cancelled a commitment, or let something fall completely through the floor. I am not a person who drops in that way. What I did, consistently, was do things too late.

The speaker confirmation that should have gone out two weeks before the event, sent three days before. The event update that should have been posted the week before, posted the morning of. The follow-up email that should have happened the day after, going out a week later - if at all.

These are drops. Not the kind where the ball disappears. The kind where you scramble, pick it back up, and keep going - slightly off-rhythm, slightly behind, hoping no one noticed. The pattern is still running. The timing is wrong.

The problem with this kind of drop is that you get very good at recovering from it, which means you stop recognising it as a problem. Until the drops compound. Until doing things too late becomes the rhythm itself.

In 2023, I was running the AWS User Group Vienna, consulting for clients, speaking at conferences, and organising events. Nothing was falling apart. But everything was landing a little late, with a little too much scrambling at the end. I was the only one tracking all of it. That’s not a sustainable architecture.

This is not a story about hitting a wall or burning out. It’s a story about pattern recognition. I could see the pattern clearly: everything arrives late because you’re the single point of responsibility. The system was giving me feedback. So Philipp and I built Otto - not because I was failing, but because two people and a well-designed system will always outperform one person with good intentions and a long memory.

So we built Otto.

Otto - Organizer That Takes Over - started as a Slack bot that could answer common community questions without me. Just a simple retrieval system. A ball agent, in my current vocabulary. Single-purpose, lightweight, totally dumb in the best possible way.

And then Otto dropped the ball too.

He gave someone wrong information about an upcoming event. He confused two meetups. The person showed up to the wrong location.

And that’s when I learned something important: Otto dropping the ball taught me more about what Otto needed to know than six months of careful design would have.

Why You Can’t Design for Failure Without Experiencing It

In software, we talk about failure modes constantly. Retry logic. Circuit breakers. Graceful degradation. Dead letter queues. We have entire disciplines around this.

But there’s a gap between designing for failure and knowing which failures to design for. And that gap is where most AI agent projects fall apart.

I’ve seen teams spend three months designing an agent architecture with elaborate error handling for every failure mode they could imagine. Then they deploy it and it fails in a completely different way than anything they designed for. The agent starts hallucinating data it was never meant to handle. The tool calls time out in a pattern they didn’t anticipate. The context window fills up with something nobody predicted.

You can’t fully anticipate failure in advance. Not in juggling, not in software, not in AI. You have to let things run until they break, observe where they break, and then design the response.

This is not recklessness. This is structured iteration. There’s a difference between dropping balls in practice and dropping balls during a performance. The former is how you learn. The latter is a system design problem - it means you didn’t practice enough in the right conditions.

The Three Kinds of Drops (and What They Mean)

When I teach juggling, I tell students to pay attention to how they drop. Because different drops tell you different things.

The reach drop: You throw too wide and you reach for it and still miss. This means your throw was off, not your catch. Don’t try harder - fix the throw.

In agents: the output fails because the input was malformed or ambiguous. The agent isn’t broken; the prompt or the data pipeline is. Fix upstream before you redesign the agent.

The momentum drop: You were on a good run, got in the zone, and suddenly lost the rhythm. Often happens when you’ve been going long enough that you start anticipating rather than reacting. You throw where you think the ball will be, not where it is.

In agents: the agent was working fine, then started producing degraded outputs after extended runs. Classic context contamination. The agent is anticipating patterns from earlier in the context window instead of reading the current state. Clear the context. Reset the session. Don’t try to fix it in-flight.

The environmental drop: Something changed in the environment - a gust of wind, bad lighting, an unexpected noise - and it disrupted a pattern that was otherwise solid.

In agents: the downstream API changed. The data schema shifted. The model got updated. The agent was fine; the environment changed. This is the hardest drop to catch before it happens, because you’re not watching the right thing. You’re watching the agent. You should be watching the environment too.

Three drop types and their agent system equivalents. Each maps to a different root cause - and a different fix.

Designing for the Drop (Not Against It)

Here’s the shift that changes everything: instead of designing your agent system to avoid failure, design it to surface failure fast and cheaply.

In juggling, this is called controlled dropping. It’s a real skill. Advanced jugglers practice deliberately dropping a prop in a way that lets them recover quickly - positioned, not sprawled across the floor. The drop becomes a reset, not a crisis.

For AI agents, controlled dropping means:

1. Make failures observable. Every agent run should produce a structured log: what was the input, what did the agent do, what was the output, what tools did it call, did anything time out or return an unexpected response. You can’t learn from drops you can’t see.

My setup: every AgentCore crew writes to a Results database in Notion. Not just the output - the execution trace. When something goes wrong, I can see exactly what the agent was looking at when it made the wrong decision.

2. Make failures cheap. Your most complex, expensive agents - the clubs and devil sticks - should not be running unchecked in production on important data. They should have a human-in-the-loop checkpoint before their output becomes real. Not because you don’t trust them, but because you’re still learning where their drops are.

My setup: any agent crew that writes to client-facing systems (proposals, invoices, customer emails) has a draft-first pattern. The agent produces the output; a human approves it before it sends. Once the error rate drops below a threshold I’m comfortable with, I remove the checkpoint. But not before.

3. Build the recovery path before you need it. In juggling, you practice the recovery before you practice the trick. You practice bending down to pick up the club so that when you drop mid-routine, you do it smoothly without breaking the show. The recovery is part of the act.

For agents: what happens when the research agent returns garbage? Does the content drafter know to flag it instead of continuing? Does the dispatcher know to re-route? Does something notify me? Or does the garbage quietly propagate through four downstream agents and end up in a published blog post?

4. Distinguish between agent failure and system failure. This is the most important one. When something goes wrong, most people’s first instinct is to blame the agent. Retrain it. Prompt-engineer it. Swap in a bigger model.

But in my experience, most agent failures are actually system failures: the wrong data, the wrong context, the wrong trigger, the wrong tool configuration. The agent is doing exactly what it was designed to do - with broken inputs.

Before you touch the agent, look at everything upstream of it.

Design your agent system to surface failure fast and cheaply. The goal isn’t a juggling routine that never drops - it’s a juggler who can drop, recover, and keep going without anyone in the audience noticing.

Otto’s Most Embarrassing Drop (And What It Built)

Otto version 1 was a Slack bot running on AWS Lambda with a fine-tuned model on Bedrock. He could answer questions about upcoming events, tell people who to contact, and pull information from our GitHub issues.

One day, someone asked him: “Is the June meetup still happening?”

Otto said yes. Confidently. With the date and location.

The June meetup had been canceled two days earlier.

Otto had no way of knowing. His knowledge came from a knowledge base that hadn’t been updated. His confidence was a model artifact - he was trained to sound definitive, and he was definitive about stale information.

This is a reach drop. The throw was wrong - the knowledge base update process was broken. Fixing it required adding a sync mechanism between the event cancellation workflow and Otto’s knowledge base. Not retraining Otto. Not prompting him differently. Fixing the upstream pipe.

But here’s what that drop built: we now have an automatic sync. Every time an event status changes in Notion, a trigger fires, the knowledge base gets updated, and Otto is current within fifteen minutes. That automation didn’t exist before. It only exists because Otto dropped the ball.

This is the compounding value of dropping in practice: each drop, handled well, builds something that makes the system more robust. Not just patched - genuinely better.

The One Thing I Tell Every Team Starting With Agents

When a team comes to me and says they want to build AI agents, the first thing I say is:

“Great. Plan to drop the ball. A lot. In the beginning, make it your job to drop the ball as fast as possible and see what happens.”

Deploy something real - not a demo, not a sandbox with synthetic data, something that touches your actual workflow - and let it run. Watch where it fails. Document the failures. Build the recovery.

Don’t wait until you’ve designed a perfect system to start. A perfect system is one that’s already been through its drops and learned from them.

And don’t treat drops as evidence that the approach doesn’t work. Three balls in the air is an unstable system. It only stays up because you’re actively maintaining it. The same is true of any agent crew doing real work.

The goal isn’t a juggling routine that never drops. The goal is a juggler who can drop, recover, and keep going without anyone in the audience noticing.

Key takeaways

1 Drops are information - not evidence of failure, but feedback about where your current limits are. The same is true for AI agent failures.
2 There are three drop types: reach (upstream input was wrong), momentum (context contamination after long runs), and environmental (the system around the agent changed). Each has a different fix.
3 Design for surfacing failure fast and cheaply, not for avoiding failure. Controlled dropping is a skill, not a shortcoming.
4 Before blaming the agent, look upstream. Most agent failures are system failures - wrong data, wrong context, wrong trigger.
5 Every handled failure builds something. Otto's stale-data drop created an automatic sync pipeline that now runs continuously. Drops compound into robustness if you handle them well.