Count Your Props Before You Start - theJugglingCompany.com

Before a juggler starts, they do something that looks invisible: they count.

Not because they do not know how many props they have. Because the count is where the planning lives. Three balls means a three-ball pattern - the cascade, the shower, the columns. Four balls means four-ball patterns. Five means five. You do not design the pattern and then count the props. You count the props and then design the pattern.

This sequence matters more than it sounds. A pattern designed for the wrong number of props is not ambitious. It is broken from the first throw.

types of limits

Rate limits, quota limits, and hard platform limits

1st

real work

Counting your props is not a step before the work - it is the first work

fixes available

Hard limits cannot be changed by cleverness, money, or architecture

What the hands hold is the constraint

The inventory is whatever is in hand right now. Not the props you are hoping to find. Not the props on order from somewhere. The actual count, today.

The pattern that emerges has to fit that count. Shannon’s juggling theorem (1980) made this exact: F + D = N(B + E), where flight time, dwell time, hands, balls per hand, and empty time are not independent variables. Change the ball count and every other term in the pattern shifts. There is no pattern that works at the wrong count.

Building with software - particularly cloud infrastructure, SaaS platforms, or external APIs - is the same discipline. And it is violated constantly. People design systems for the resources they expect, or the tier they plan to upgrade to, or the API limits they read about in a blog post, rather than the actual constraints of the plan they are currently on.

The three categories of limits

Every platform you build on has three kinds of limits worth counting before you start.

Rate limits are the props you can juggle per unit of time. An LLM API might allow 60 requests per minute on a free tier. A geocoding API might cap at 1,000 calls per day. If your pattern requires 200 calls per minute and you have 60, you do not have a scaling problem you can solve by being clever. You have a counting problem. The number is wrong.

Quota limits are the total you can consume within a billing period. Storage quotas, compute hours, token budgets, seats. These deplete continuously and do not reset until the period ends. Running into a quota in production is not a failure mode worth engineering around. It is a math error that happened before any code was written.

Hard limits are things the platform will simply not do regardless of tier or price. Maximum context window for a model. Maximum concurrent database connections. Maximum file upload size. No amount of money or architectural cleverness changes these. They are the physics of the platform you chose.

Three categories of platform limits - each requires a different kind of counting before you design the pattern

This is not only a SaaS concern. Self-hosted systems have their own inventory that requires the same counting.

A machine has RAM, CPU cores, disk I/O throughput, and network bandwidth. These are fixed at a given point in time. A pattern that requires 64GB of working memory on a 16GB machine is not a promising approach to cut down later - it is a miscounted inventory. An inference workload that assumes 4 parallel processes on a 2-core server will run at half the expected speed at best and not at all at realistic load.

Open source projects have similar structures in their documented hardware requirements. Running a vector database, a message broker, and a model inference server on a single low-memory virtual machine is juggling four props when you have hands for two. The props do not care that you optimistically believed otherwise.

Juggling	Cloud and infrastructure planning
Count the props before designing the pattern	Read rate limits and quotas before writing code that calls an API
Three balls means a three-ball pattern - not a four-ball plan	Design for the tier you are on, not the tier you expect to upgrade to
Pattern designed for wrong count breaks on first throw	System designed for wrong limits fails at first production load
Four props, hands for two: immediate drop	Vector DB + broker + inference server on 8GB RAM: immediate OOM
Hard limit: physics of the human body	Hard limit: platform maximum context window or connection count

The prop count before a juggling routine maps directly to resource counting before system design

The discipline that gets skipped

The pattern for checking limits is straightforward: before writing code that calls an API, read the rate limits in the documentation. Before designing a database schema, verify the storage tier you are on. Before deploying to a machine, run the actual workload on that machine and measure what it costs.

None of these steps is exciting. All of them feel like delay. Schedule pressure makes them easy to skip with the justification that you can handle limits later when you hit them.

The count is not a bureaucratic step before the real work. It is the first real work.

But a juggler who starts throwing before knowing what is in their hands is not a juggler who will run out of time mid-pattern. They are a juggler who was going to drop something on the first throw. The count is not a bureaucratic step before the real work. It is the first real work.

The three balls are not a limitation

A good three-ball routine, run cleanly, is worth more than a four-ball pattern that drops on the first throw. The props you have are not a placeholder on the way to something bigger. They are what you are working with right now, and the pattern you build for them should be good enough to run perfectly.

Know your props. Then build the pattern.

Key Takeaways

1 Count before you design: the props in your hand determine the pattern, not the props you are planning to acquire. Design systems for the resources you actually have.
2 There are three kinds of limits: rate limits (per unit time), quota limits (per billing period), and hard limits (platform physics). Each requires a different response - and hard limits have no engineering fix.
3 Hardware and self-hosted systems are not exempt. RAM, CPU cores, and I/O throughput are fixed at deployment time. Miscounting them produces the same first-throw drop as miscounting API limits.
4 A clean three-ball pattern outperforms a dropped four-ball attempt. Know your constraints and build something that runs perfectly within them, rather than something ambitious that fails from the start.

Related: The Wrong Number - on what happens when the count is off and the pattern has already started.