theJugglingCompany.com

Blog · 19 February 2026 · 5 min read BrainTech

The Pizza Agent Model: How I Size My AI Agent Teams

Amazon's 2-pizza rule remixed for AI agents - because you actually can measure compute capacity in pizza slices.

Two hands cupped together offering a glowing white ball

Amazon has a rule: a team should be small enough that two large pizzas can feed everyone. The idea is that if you need more pizza, your team has too many people, which means too much coordination overhead.

It’s a metaphor that sounds clean until you try to apply it to actual humans. Do they eat pizza? Did they already have lunch? Is one person vegetarian and half the pizza doesn’t count?

You can’t really measure a human team in pizza slices.

But you know what you actually can measure in pizza slices? AI agents.

16
Slices total
Two large pizzas, eight slices each - your compute and cost budget for one concurrent agent session
1-2-4
Slice weights
Small=1 slice (lightweight), Medium=2 (standard reasoning), Large=4 (heavy context, long chains)
Constraint
Why it works
When you only have 16 slices, you can't just add agents until it feels comprehensive - you have to ask the right questions

Where this came from

I’ve been automating my consulting business with AI agents - 64 crews running on an AgentCore runtime, handling everything from email triage to research synthesis to speaker coordination for meetups.

At some point I needed a simple way to think about how many agents I can run concurrently without blowing my budget or hitting rate limits. The math is real: different model tiers cost very different amounts, and running too many heavy agents in parallel creates problems.

I kept staring at the Amazon rule and thinking: this applies here. Not because my agents need to eat - obviously they don’t - but because I needed the same kind of intuitive constraint.

So I adapted it.

The model

Two large pizzas. Eight slices each. Sixteen slices total. That’s your compute and cost budget for one concurrent agent session.

Each agent “eats” slices based on how much model capacity it needs:

Agent sizeSlicesWhat this means
Small (1 slice)Lightweight, single-purposeShort context, fast, cheap. Runs on a small or distilled model. Examples: news scanner, wellness ping, email classifier.
Medium (2 slices)Standard reasoning, multi-stepMid-tier model. Can handle context, do chaining, write coherent output. Examples: content drafter, proposal generator, invoice chaser.
Large (4 slices)Heavy context, long chains, multi-docHigh-context setup. Examples: research synthesizer, architecture reviewer, complex demo builder.

Sixteen slices max per concurrent window. Go over and things get slow, expensive, or both.

MORNING CRONDEEP WORKREVENUE CREW6/16 slices usedemail 1 + security 1 + wellness 1OKR 2 + travel 16headroom for urgent tasks10/16 slices usedresearch 4 + arch reviewer 4content drafter 210comfortable, 6 slices remain8/16 slices usedproposal 4 + invoice 2onboarding 28can run with morning cron
Pizza slice allocation across three real concurrent configurations - morning cron, deep work, revenue crew

Why this actually works

The 1-2-4 ratio isn’t arbitrary. It reflects real pricing tiers. Haiku-class models cost roughly 4x less per token than Sonnet-class. Sonnet costs roughly 5x less than Opus. The slice counts track actual cost and compute weight.

The constraint forces honest decisions. When you only have 16 slices, you can’t just add agents until it feels comprehensive. You have to ask whether a task actually needs a 4-slice agent or whether a 2-slice one would work. Those are the right questions. The pizza makes you ask them.

Scaling beyond 16 slices

If you genuinely need more than 16 slices running concurrently, you order more pizzas. But the constraint per pizza still holds. You don’t eat 32 slices at once - you run two separate sessions, maybe in parallel queues, maybe with different budget owners.

For a solo consultant, one pizza is usually enough. For a startup team, maybe two or three concurrent pizzas across departments. For an enterprise, you’re running a full pizzeria.

The model scales. The constraint per session stays.


Post 1 in the Juggling-Pizza Framework series. Next: Balls, Clubs, and Rings - How I Use Juggling Props to Explain AI Agents.