HTTP_TOOL_RATE_LIMITED and the AI silently moves on (it does NOT retry).
Layer 1 — Per-tenant production budget
| Capacity | 60 invocations |
| Refill | 60 / minute (1 per second) |
| Burst | 30 |
| Scope | All HTTP Tool calls on your tenant, summed across every configured tool and every active agent |
Layer 2 — Per-tool budget
| Capacity | 30 invocations |
| Refill | 30 / minute |
| Burst | 15 |
| Scope | Each tool has its own bucket, keyed by tenant_http_tools.id |
lookup_routing tool gets hammered, check_balance keeps working.
Layer 3 — Test-fire budget
| Capacity | 10 invocations |
| Refill | 10 / minute |
| Burst | 5 |
| Scope | Per tenant. Separate bucket from production. |
How the bucket arithmetic works
What the AI sees when limited
The runtime returns:llm_message as the tool result and continues the conversation. It does NOT auto-retry. From the caller’s perspective the AI just decides it has enough information.
What the dashboard surfaces
Every invocation row in the Invocations panel includes the remaining tenant-bucket count at the time of the call (in the_meta.rate_limit_remaining field of the response body). Rows that hit a limit are flagged red with the HTTP_TOOL_RATE_LIMITED code.
Asking for a higher quota
Both layers are hard-coded constants in v1. If your use case genuinely needs more (e.g. you run 100+ concurrent calls and every one chains 3-4 HTTP Tool lookups), open a support ticket with:- Your tenant ID.
- Approximate call volume per minute.
- Average HTTP Tool invocations per call.
Test-fire vs production budget
A test fire does not consume your production budget. You can press Send test repeatedly while iterating on your endpoint, and the live AI agent still has its full 60/min budget for real calls. The test-fire bucket is small (10/min) on purpose — we never want test runs to mask a real rate-limit problem in production.Implementation note (for the curious)
Buckets live in the edge function’s module scope (Map<string, Bucket>). Supabase keeps the runtime warm under load, so bucket state survives across requests. Cold starts reset the bucket to full — a corner case where a tenant just over their limit might briefly get one or two extra calls through. We’re comfortable with this drift for v1; if you observe surprising behavior, file it under “we’d like to know.”
Next: Security restrictions
What URLs we block and why.

