Ship to production

Retries, observability, and the go-live checklist.

You've built it. Before flipping production traffic on, here's the punch list.

Errors and retries

Idempotency by tool class

ClassExamplesSafe to retry on failure?
Pure readsget_addresses, search_restaurants, get_order_detailsAlways
Cart mutationsupdate_cart, update_food_cart, clear_cart, flush_food_cartYes - server is idempotent on session; retrying the same args won't double-add
Order placementplace_food_order, checkout, book_tableNo, not by default - see below
Coupon applicationapply_food_couponYes
Trackingtrack_order, track_food_order, get_booking_statusAlways

Order placement (non-idempotent)

place_food_order, checkout, and book_table are not safe to blind-retry on network failure. Instead:

  1. On 5xx or network error, wait 2-5 seconds.
  2. Call get_food_orders / get_orders / get_booking_status to check if the order actually went through.
  3. If yes, treat the original failure as success.
  4. If no, retry the original call.

A future Idempotency-Key header will close this gap; the check-then-retry pattern works today.

Exponential backoff

For retriable failures (generic 5xx and upstream timeouts / errors - symbolic codes UPSTREAM_TIMEOUT / UPSTREAM_ERROR / INTERNAL_ERROR ship once the error-code registry is live):

async function retry<T>(fn: () => Promise<T>, maxAttempts = 4): Promise<T> {
  let attempt = 0;
  while (true) {
    try {
      return await fn();
    } catch (e: any) {
      attempt++;
      if (attempt >= maxAttempts) throw e;
      if (!isRetryable(e)) throw e;
      const baseMs = 500 * 2 ** (attempt - 1); // 500, 1000, 2000, 4000
      const jitterMs = Math.random() * baseMs * 0.3;
      await new Promise((r) => setTimeout(r, baseMs + jitterMs));
    }
  }
}
 
function isRetryable(e: any) {
  const status = e?.status ?? e?.response?.status;
  if (status && status >= 500 && status < 600) return true;
  const code = e?.body?.error?.code;
  return ["UPSTREAM_TIMEOUT", "UPSTREAM_ERROR", "INTERNAL_ERROR"].includes(code);
}

Rate-limited (planned: 429)

MCP-layer rate limiting is not enforced in v1.0 - you will not see 429 Too Many Requests from a Swiggy MCP endpoint today (see rate-limits). Wire the handler so you're ready when v1.1 ships it:

if (response.status === 429) {
  const seconds = Number(response.headers.get("Retry-After") ?? 30);
  await new Promise((r) => setTimeout(r, seconds * 1000));
  return retry();
}

Once the header ships, always honour Retry-After - don't stack it on top of exponential backoff.

Auth failures (401)

Re-run the OAuth flow. Never retry with the same token. Auth errors also surface via JSON-RPC -32001 at the transport layer.

if (response.status === 401) {
  await reAuthenticate();
  return retry();
}

Retry budget

For user-facing flows: cap total retries at 30 seconds of wall-clock time. Beyond that, fail loudly and let the user decide.

Observability

The session id is your friend

Every tool call is tagged with a session id that flows across Swiggy's services. On your side, log it. When you file a support ticket, include the session id of the failing call - we can trace the full request path end-to-end in seconds.

{
  "ts": "2026-04-17T10:00:00Z",
  "level": "info",
  "event": "mcp_tool_call",
  "client_id": "your_client_id",
  "tool": "search_restaurants",
  "user_id_hash": "sha256:...",
  "session_id": "...",
  "duration_ms": 217,
  "status": "ok"
}

Hash user IDs at rest unless you have a specific DPDP-compliant reason to store them plaintext. See data-and-compliance.

Metrics to track client-side

MetricWhy
Tool call latency (p50, p95, p99)Surface regressions before users notice
Tool call success rateCatches partial outages
4xx / 5xx rateSeparates client bugs from server bugs
OAuth refresh frequencyUnusually high = token-management bug

OpenTelemetry

If your platform supports OpenTelemetry, instrument every callTool with a span. Tag with the session id. When you cross-reference with Swiggy support, we'll be on the same timeline.

What Swiggy exposes

  • status.swiggy.com/mcp - public status page (shipping in v1.1).
  • Incident comms via email to your designated engineering contact.
  • Per-partner usage dashboards for enterprise integrators.

What's not available

  • Raw server logs (security / privacy).
  • Infra dashboards.
  • User-level audit logs (only on lawful request - see data-and-compliance).

Go-live checklist

Before your first real-user traffic:

  • Credentials: production client_id issued, staging has been green for ≥ 48 hours.
  • Redirect URIs: every URL your OAuth flow might redirect to is allowlisted (exact-match).
  • Server allowlist: your client_id is approved only for the Swiggy MCP servers you actually call (food, instamart, dineout); v1 scopes (mcp:tools/mcp:resources/mcp:prompts) are requested uniformly.
  • Error handling: retry logic per this page is in place; auth failures (401 / JSON-RPC -32001) and upstream timeouts each have an explicit branch.
  • Idempotency guards: order-placement paths do check-then-retry, not blind retry.
  • Cart confirmation: no order is placed without user-visible confirmation of items + total.
  • Rate limits: you've benchmarked your expected QPS and confirmed you're under the ceiling (rate-limits).
  • Observability: session id logged on every call; metrics exported to your usual observability stack.
  • Deprecation monitoring: alerting set up on the _meta.swiggy.deprecation field so you won't miss a breaking change.
  • Incident contact: Swiggy has an email + Slack channel to reach your on-call for S0/S1.
  • Data handling: your product's data retention, deletion, and consent flows align with data-and-compliance.
  • Voice vs chat: if you have a voice surface, your prompts are shaped per voice-vs-chat.
  • Support runbook: you've written an internal runbook for "what do we do when Swiggy returns X".
  • Rollout: traffic ramps 1% → 10% → 50% → 100% over at least 24 hours so any regression is caught early.

When to escalate

If production traffic is failing and you've already:

  1. Verified tokens are fresh and scopes match.
  2. Confirmed you're hitting the right endpoints.
  3. Seen a sudden error-rate spike on Swiggy's side (not yours).

...mail builders@swiggy.in with the failing session ids and timestamps. Enterprise partners use their designated engineering contact + SEV channel.