Skip to main content
The mechanical contract (URL + signing + JSON in / JSON out) is the easy part. The hard part is making the AI use your tool the way you want. This page is the cheat sheet.

Write the description like a one-paragraph briefing

The AI uses the description to decide when to call your tool. Generic descriptions get generic behavior.
Looks up routing information.
Pattern: what it does + when to use it + what you’ll get back + what to do with that.

Always include a hint in your response

The AI reads the entire response body. If you return raw data without guidance, the AI guesses at what to say next. Bake in the next-action hint:
{
  "extension": 6105,
  "queue_name": "Billing",
  "estimated_wait_seconds": 45,
  "hint": "Transfer to ext 6105 for billing. Mention the ~45s wait first."
}
The hint field name is a convention, not a requirement — instruction, next_step, say all work. The point is: include a one-liner in plain English that tells the AI what to say next.

Keep responses small

Body sizeBehavior
< 1 KBIdeal. AI uses everything.
1–4 KBFine. Slight latency cost.
4–64 KBAcceptable, but only the first 4 KB is shown in the dashboard preview.
> 64 KBTruncated. AI gets first 64 KB + a [truncated] footer.
Trim irrelevant fields server-side. If your CRM returns 50 fields on a customer record, send the AI the 5 it actually needs.

Make endpoints idempotent

The AI might call the same tool twice in a row with the same arguments — usually because it got distracted by the caller’s next sentence and asks again. Side-effecting tools should be idempotent under that pattern. If you can’t make the tool idempotent (e.g. it charges a card), include a transaction_id in the response and have the AI confirm before reusing it.

Respond in under 2 seconds

The caller is waiting on a live phone call. The 10s budget exists for emergencies; the actual target is sub-2-second p95. Common ways to blow past 2s:
  • Chained API calls inside your endpoint. Pre-cache.
  • Database round-trips on every call. Add an index, denormalize, or memcache.
  • Synchronous third-party API calls (Stripe, Salesforce). Cache aggressively.
If your data legitimately takes > 2s to fetch, design two tools:
  1. start_lookup — kicks off the query, returns a job ID immediately.
  2. check_lookup — polls the job ID, returns “still processing” or the result.
The AI can chat with the caller between calls.

Don’t leak PII you didn’t intend to

Once data crosses the wire, the AI may verbalize it. Common leaks:
  • Card numbers — Send the last 4 digits, never the full number.
  • National IDs — Mask before returning.
  • Internal SKU codes — Translate to customer-facing names.
  • Stack traces — On error, return {"error": "Lookup failed", "hint": "Apologize and offer a callback."} not the raw exception.

Don’t trust LLM-provided arguments as gospel

The schema validates types but not content. If your tool takes a phone_number, the AI might pass "my mom's". Defend on your side:
const phone = body.arguments.phone_number;
if (!/^\+\d{8,15}$/.test(phone)) {
  return res.json({
    error: 'invalid_phone',
    hint: 'Ask the caller for their phone number in international format.',
  });
}
Note we return 2xx with a structured error body — the AI reads hint and recovers. Returning 4xx would also work but the AI’s response is less specific (HTTP_TOOL_CUSTOMER_ERROR).

Use Bearer tokens, even for “internal” endpoints

Anyone on the public internet can find your endpoint URL (it’s in our invocation logs, copied as curl, etc.). Without auth, anyone can hit it once they know the URL.
  • Generate a token specific to this tool.
  • Save it in the tool’s auth field.
  • Verify it on your side and verify the HMAC signature. Both gates.

Audit the test fire before going live

Before you flip the tool to active, run Send test at least once and check:
  1. Does your endpoint receive the request? (Check your access logs.)
  2. Does signature verification succeed? (Check your auth logs.)
  3. Is the response shape what the AI will get? (Look at the Test panel.)
  4. Does the latency look acceptable?
Then place a real call and check the Invocations panel. The full request + response is there.

Version your responses if you change them

If you change the shape of your tool’s response (e.g. you used to return extension and now you return routing_target), the AI’s description-based behavior will lag. Either:
  • Make the new field a superset (return BOTH extension and routing_target for a transition window).
  • Update the tool description to reflect the new shape.
  • Test in a webhook.site / ngrok sandbox first.

Monitor the invocations log

The Invocations panel shows the last 30 days of activity. Skim it weekly. Common things to look for:
  • HTTP_TOOL_CUSTOMER_ERROR rate > 5% → your endpoint is flaky.
  • HTTP_TOOL_TIMEOUT showing up → your endpoint is slow.
  • HTTP_TOOL_INVALID_ARGS repeating → your description is misleading the AI.
  • A particular tool is never called → maybe the AI can’t figure out when to use it; rewrite the description.

When NOT to use HTTP Tools

HTTP Tools is the right shape for synchronous, agent-decided lookups during a live call. Don’t use it for:
  • Background work — webhooks pushed from your side to ours. We don’t support inbound webhooks via this mechanism.
  • File uploads — body must be JSON / text. No multipart, no binary.
  • Long-running jobs — anything > 10s. Split into start/check tools.
  • Anything the AI shouldn’t be deciding to call — purely deterministic logic belongs in your own backend, called from your own systems.

Next: Code samples

Drop-in handlers for Node.js, Python, PHP, and Go.