Drop Recovery: What Incident Response Can Learn from Jugglers

There is a moment, in any sustained juggling pattern, when a ball drops. New jugglers respond the way most humans respond to a falling object: they reach for it. Eyes track the ball, body leans, attention collapses to the dropping point.

What this guarantees is that the next two balls also drop.

Experienced jugglers do something else. The dropped ball is, for the moment, lost. The pattern in the air is not. The first action is not to chase the drop - it is to keep the remaining balls moving, find a stable ending point for them, and only then go retrieve the one on the floor.

This sequence - stabilise the pattern, then recover the dropped element - is the single instinct that distinguishes recoverable juggling failures from cascading ones. It is also, almost exactly, the operating principle that distinguishes good incident response from bad.

Dropped ball

the visible failure

Balls still in air

the system to stabilise first

Time to think

the response is already happening

Order

What matters most

stabilise, then recover

Why the instinct is wrong

The instinct to chase the dropped ball is not stupid. It is the natural response to the loudest signal in the system. The ball that just hit the floor is the most visible failure. Everything else in the pattern is, by comparison, fine.

The problem is that “fine” is a moving target. The other two balls are in flight. They will land in approximately one second. If your attention has shifted to the floor, your hands are not in position to catch them. The visible failure has captured the cognitive resource needed to prevent the next two failures.

This is the same shape as the most common pattern of incident escalation. An alert fires. An engineer focuses on the symptom. While they are debugging the symptom, the underlying instability propagates to two more services. By the time the original symptom is understood, the incident has spread to systems that were healthy when the alert fired.

The juggler’s protocol prevents this. Eyes do not leave the pattern. Hands keep working. The dropped ball stays on the floor until the remaining balls have somewhere safe to go.

What “stabilise the pattern” looks like in operations

The juggling instruction is concrete: when a ball drops, the remaining balls need a controlled landing. You do not try to maintain the original pattern with two balls - that is unstable. You bring the pattern to a controlled stop, holding the two balls in the same hand if needed, and then go pick up the third.

In operations, the parallel is service degradation rather than service protection. When part of the system fails, the question is not “how do we restore the original capability immediately.” It is “what is the smallest stable subset of the system we can hold while we fix the broken part.”

This often means deliberately reducing functionality. Disabling a non-critical feature to remove load. Routing traffic to a smaller set of healthy nodes. Accepting degraded user experience as the temporary stable point. These are uncomfortable choices because they look like additional failures - users will notice the disabled feature, the slower response, the regional outage.

They are not additional failures. They are the controlled landing.

The dropped ball is not always the first thing to pick up

Once the pattern is stable, there is a second decision: which problem to solve first.

In juggling, this is usually trivial - you pick up the dropped ball and resume. In operations, the dropped ball is rarely the most important problem. The visible failure is often a downstream symptom of an upstream cause. The service that paged is not always the service that needs the fix.

This requires a discipline that incident response teams develop over time: separating “what failed loudest” from “what failed first.” The loudest failure is the alert. The first failure is the root cause. They are often the same; in mature systems with good monitoring, they are increasingly often not.

The juggling metaphor is useful here precisely because it forces the order. Stabilise. Then look at all the visible elements. Then decide which one to handle next. The order is not negotiable - if you skip the stabilisation step, the rest of the protocol does not save you.

The visible failure is rarely the first failure. Stabilise the system, then diagnose. The loud signal and the root cause are increasingly different things.

The workshop format

This metaphor is concrete enough to teach physically. A juggling-as-incident-response workshop runs about ninety minutes with twenty participants. The format I have used:

Round one: each participant learns a basic three-ball cascade with coloured balls. Each colour represents a service - red is auth, green is the API, blue is the database. The point of round one is just to establish that the participant has a sustainable pattern.

Round two: a deliberate drop is introduced. The facilitator calls out which ball - which service - has just failed. The participant must keep the remaining two balls moving, bring them to a stable hold, and only then retrieve the dropped one. The drop is repeated several times across the round.

Round three: pairs work together passing balls between them, simulating service-to-service dependencies. When a drop happens, both participants have to coordinate the stabilisation. This introduces the communication problem that is the actual hard part of multi-team incident response.

The workshop teaches one thing physically that no slide can teach: the instinct to chase the drop is wrong, and changing the instinct requires repeated controlled practice. Twenty minutes of deliberate drops is enough for most participants to feel the new instinct take hold. They report the same shift back at work, the next time something pages.

The point is not the metaphor

The juggling exercise is not just an analogy. It is a way to put the lesson into the body, where it can be retrieved under pressure. People who have done the workshop describe a specific experience the next time an alert fires: the urge to chase the symptom is still there, and the trained instinct to stabilise first is now there alongside it. Which one wins depends on which one is more recent.

This is consistent with the literature on motor learning under pressure. Trained physical responses survive stress better than verbal rules. A team that has been told “stabilise the system before debugging” remembers the rule when calm. A team that has practised it physically retrieves the response when paged at three in the morning.

That is the actual claim. Not that juggling teaches incident response in some abstract sense. That juggling provides a fast, embodied, repeatable way to install a counter-instinct that is otherwise hard to train.