SmartAlex Documentation - AI Voice Assistant Platform

HTTP Tools is rate-limited at three layers, all using token buckets. Each bucket refills at a fixed rate; you spend tokens as you go. When a bucket empties, the next request gets HTTP_TOOL_RATE_LIMITED and the AI silently moves on (it does NOT retry).

Layer 1 — Per-tenant production budget


Capacity	60 invocations
Refill	60 / minute (1 per second)
Burst	30
Scope	All HTTP Tool calls on your tenant, summed across every configured tool and every active agent

This is the global cap. If your tenant is making 200 calls a minute spread across tools and agents, you’ll start hitting this limit and the AI will skip non-essential lookups.

Layer 2 — Per-tool budget


Capacity	30 invocations
Refill	30 / minute
Burst	15
Scope	Each tool has its own bucket, keyed by `tenant_http_tools.id`

Prevents one runaway tool from starving other tools on the same tenant. If your lookup_routing tool gets hammered, check_balance keeps working.

Layer 3 — Test-fire budget


Capacity	10 invocations
Refill	10 / minute
Burst	5
Scope	Per tenant. Separate bucket from production.

The dashboard’s Send test button uses this bucket. Means you can hammer test-fires while debugging without burning your production budget.

How the bucket arithmetic works

At t=0:                   tokens=60      ┌─ full bucket
                                         │
At t=0 invocation:        tokens=59      ▼
At t=1 invocation:        tokens=58
At t=2 invocation:        tokens=57
At t=2.5 (refill 2.5s):   tokens=59.5    (refill = elapsed_seconds * 1 token/s, capped at 60)
At t=60 invocation:       tokens=60      ◀── back to full if idle

The bucket is continuously refilling. You don’t have to wait a full second between calls — you can spend 30 tokens in a 100ms burst (as long as the bucket has 30 tokens at the start), then refill at 1/s.

What the AI sees when limited

The runtime returns:

{
  "ok": false,
  "error_code": "HTTP_TOOL_RATE_LIMITED",
  "llm_message": "I've used lookups too many times recently. I'll work with what I have."
}

The AI receives the llm_message as the tool result and continues the conversation. It does NOT auto-retry. From the caller’s perspective the AI just decides it has enough information.

What the dashboard surfaces

Every invocation row in the Invocations panel includes the remaining tenant-bucket count at the time of the call (in the _meta.rate_limit_remaining field of the response body). Rows that hit a limit are flagged red with the HTTP_TOOL_RATE_LIMITED code.

Asking for a higher quota

Both layers are hard-coded constants in v1. If your use case genuinely needs more (e.g. you run 100+ concurrent calls and every one chains 3-4 HTTP Tool lookups), open a support ticket with:

Your tenant ID.
Approximate call volume per minute.
Average HTTP Tool invocations per call.

We can lift the per-tenant cap on a per-tenant basis.

Test-fire vs production budget

A test fire does not consume your production budget. You can press Send test repeatedly while iterating on your endpoint, and the live AI agent still has its full 60/min budget for real calls. The test-fire bucket is small (10/min) on purpose — we never want test runs to mask a real rate-limit problem in production.

Implementation note (for the curious)

Buckets live in the edge function’s module scope (Map<string, Bucket>). Supabase keeps the runtime warm under load, so bucket state survives across requests. Cold starts reset the bucket to full — a corner case where a tenant just over their limit might briefly get one or two extra calls through. We’re comfortable with this drift for v1; if you observe surprising behavior, file it under “we’d like to know.”

Next: Security restrictions

What URLs we block and why.

​Layer 1 — Per-tenant production budget

​Layer 2 — Per-tool budget

​Layer 3 — Test-fire budget

​How the bucket arithmetic works

​What the AI sees when limited

​What the dashboard surfaces

​Asking for a higher quota

​Test-fire vs production budget

​Implementation note (for the curious)