The Pizza Agent Model: How I Size My AI Agent Teams

Amazon has a rule: a team should be small enough that two large pizzas can feed everyone. The idea is that if you need more pizza, your team has too many people, which means too much coordination overhead - the same n²/2 communication-cost curve Brooks formalised in The Mythical Man-Month (1975).

It’s a metaphor that sounds clean until you try to apply it to actual humans. Do they eat pizza? Did they already have lunch? Is one person vegetarian and half the pizza doesn’t count?

You can’t really measure a human team in pizza slices.

But you know what you actually can measure in pizza slices? AI agents.

Slices total

Two large pizzas, eight slices each - your compute and cost budget for one concurrent agent session

1-2-4

Slice weights

Small=1 slice (lightweight), Medium=2 (standard reasoning), Large=4 (heavy context, long chains)

Constraint

Why it works

When you only have 16 slices, you can't just add agents until it feels comprehensive - you have to ask the right questions

Where this came from

I’ve been automating my consulting business with AI agents - 64 crews running on an AgentCore runtime, handling everything from email triage to research synthesis to speaker coordination for meetups.

At some point I needed a simple way to think about how many agents I can run concurrently without blowing my budget or hitting rate limits. The math is real: different model tiers cost very different amounts, and running too many heavy agents in parallel creates problems.

I kept staring at the Amazon rule and thinking: this applies here. Not because my agents need to eat - obviously they don’t - but because I needed the same kind of intuitive constraint.

So I adapted it.

The model

Two large pizzas. Eight slices each. Sixteen slices total. That’s your compute and cost budget for one concurrent agent session.

Each agent “eats” slices based on how much model capacity it needs:

Agent size	Slices	What this means
Small (1 slice)	Lightweight, single-purpose	Short context, fast, cheap. Runs on a small or distilled model. Examples: news scanner, wellness ping, email classifier.
Medium (2 slices)	Standard reasoning, multi-step	Mid-tier model. Can handle context, do chaining, write coherent output. Examples: content drafter, proposal generator, invoice chaser.
Large (4 slices)	Heavy context, long chains, multi-doc	High-context setup. Examples: research synthesizer, architecture reviewer, complex demo builder.

Sixteen slices max per concurrent window. Go over and things get slow, expensive, or both.

Pizza slice allocation across three real concurrent configurations - morning cron, deep work, revenue crew

Why this actually works

The 1-2-4 ratio isn’t arbitrary. It reflects real pricing tiers. Haiku-class models cost roughly 4x less per token than Sonnet-class. Sonnet costs roughly 5x less than Opus. The slice counts track actual cost and compute weight.

The constraint forces honest decisions. When you only have 16 slices, you can’t just add agents until it feels comprehensive. You have to ask whether a task actually needs a 4-slice agent or whether a 2-slice one would work. Those are the right questions. The pizza makes you ask them.

Scaling beyond 16 slices

If you genuinely need more than 16 slices running concurrently, you order more pizzas. But the constraint per pizza still holds. You don’t eat 32 slices at once - you run two separate sessions, maybe in parallel queues, maybe with different budget owners.

For a solo consultant, one pizza is usually enough. For a startup team, maybe two or three concurrent pizzas across departments. For an enterprise, you’re running a full pizzeria.

The model scales. The constraint per session stays.

Where this came from

The model

Why this actually works

Scaling beyond 16 slices

Further reading