Skip to main content
When a request to Wafer fails, the response body carries a structured error envelope you can branch on:
{
  "error": {
    "type":       "invalid_request_error",
    "code":       "tool_schema_invalid",
    "message":    "tools[0].function.parameters must be a JSON Schema object",
    "param":      "tools[0].function.parameters",
    "request_id": "req_a1b2c3d4e5f6",
    "docs_url":   "https://docs.wafer.ai/errors#tool_schema_invalid"
  }
}
The same envelope ships from pass.wafer.ai (OpenAI- and Anthropic-compatible inference endpoints) and from api.wafer.ai (account, billing, and key management). Anthropic-compatible responses wrap the same fields under {"type":"error","error":{...}}. Every response — success or failure — also carries the x-request-id header. Include that ID when contacting support.
When you include an x-request-id header on your request, Wafer echoes it back instead of generating one, so you can correlate with your application logs. To keep server- and client-supplied IDs separate in our logs, Wafer prefixes client values with client_. Server-generated IDs match req_<12 hex> exactly; anything else you send is prefixed.

How to use this page

type is the coarse bucket — your SDK probably already maps it to a typed exception class (RateLimitError, BadRequestError, etc). code is the specific reason within that bucket and the right thing to branch on programmatically. Each anchor below documents one code.

Bucket: invalid_request_error (400 / 422)

schema_validation_failed

Status: 422 The request body didn’t match the endpoint’s schema. error.param points at the offending field path (e.g. messages[0].content, amount_cents). What to do: read error.param and error.message, fix the field, and retry.

tool_schema_invalid

Status: 400 A function tool you passed has an invalid parameters block. We pre-validate that parameters is a JSON Schema object (type: "object" with a properties dict) before forwarding, because the underlying backends 400 with unhelpful Pydantic errors. What to do: confirm tools[i].function.parameters matches the OpenAI function-calling spec.

tool_choice_unknown_tool

Status: 400 tool_choice.function.name doesn’t match any tool you declared in tools. What to do: ensure the tool name in tool_choice is one of the tools[].function.name values in the same request.

orphan_tool_message

Status: 400 A role=tool message references a tool_call_id that no preceding assistant message issued. Every tool message must follow an assistant message whose tool_calls[] contains a matching id. What to do: check that your conversation history is intact — every tool result must be preceded by the assistant turn that requested it.

missing_tool_call_id

Status: 400 A role=tool message is missing its tool_call_id field. The field is required so we can pair the result back to the assistant’s request.

unsupported_parameter

Status: 400 A request parameter is not supported on the endpoint or for the selected model (e.g. previous_response_id, logit_bias on some backends). What to do: remove the parameter, or check the Models page for per-model support.

unsupported_tool_type

Status: 400 Only tools[i].type == "function" is supported on the OpenAI-compatible endpoint. (Anthropic-side tool types are translated upstream.)

duplicate_tool_name

Status: 400 Two entries in tools declare the same function.name. Tool names must be unique within a request.

context_length_exceeded

Status: 400 The request would exceed the selected model’s context window. The error body includes structured fields to help you switch models programmatically:
{
  "error": {
    "type": "invalid_request_error",
    "code": "context_length_exceeded",
    "message": "The request exceeded GLM-5.1's context window (max_model_len=202752). Reduce the prompt length, compact your conversation, or lower max_tokens. Try a model with a larger context window: Qwen3.5-397B-A17B (256K context).",
    "model": "GLM-5.1",
    "context_length_limit": 202752,
    "suggested_models": [
      {"id": "Qwen3.5-397B-A17B", "context_length": 262144}
    ],
    "request_id": "req_..."
  }
}
What to do: lower max_tokens, trim the prompt, or retry against one of suggested_models.

model_not_found

Status: 404 The model value in your request doesn’t match any model your key can access. The error message includes the list of available models for your key tier.

model_not_allowed

Status: 403 The model exists but your key isn’t entitled to it (e.g. a Serverless key hitting a model only available through a paid plan, or a key from an account whose plan doesn’t include the model). What to do: pick a model from the list your key has access to, or upgrade your plan. Call GET https://pass.wafer.ai/v1/models for live availability for the bearer key.

invalid_zdr_header

Status: 400 The Wafer-ZDR request header was present but its value isn’t required (the only accepted value). What to do: either drop the header or send Wafer-ZDR: required. See Zero Data Retention.

invalid_request_tag

Status: 400 The Wafer-Request-Tag header was present but the value doesn’t match the allowed shape: lowercase ASCII, digits, dashes, underscores, 1–32 characters (^[a-z0-9_-]{1,32}$). When you send multiple Wafer-Request-Tag headers on one request, the error message identifies which value failed so you don’t have to bisect. What to do: confirm each header value matches the regex. Common causes: uppercase characters, spaces, special punctuation, an empty string, or a value longer than 32 chars. The same error fires if you send more than 8 distinct tags on one request. See Request Tagging. Dedicated endpoints only — this code never fires on serverless keys (the header is silently ignored there).

model_zdr_not_supported

Status: 400 The request set Wafer-ZDR: required against a model whose backend doesn’t support zero data retention. Currently GLM-5.1, Kimi-K2.6, and Qwen3.5-397B-A17B support ZDR; other Serverless models do not. What to do: drop the header to use the model without ZDR, or switch to a ZDR-eligible model. The model list at Agent Setup marks per-model ZDR support.

invalid_json_request

Status: 400 The request body wasn’t valid JSON, or the JSON shape didn’t include the required top-level fields (e.g. an empty body sent to /v1/chat/completions). What to do: confirm Content-Type: application/json and that the body parses as JSON. Common cause: an SDK serializing None / undefined instead of a request object.

request_body_too_large

Status: 413 The request body exceeded the per-route size cap (50 MB). What to do: for large prompts, compact your conversation, or upload large media via the Files API and reference the returned file_id in your content blocks instead of inlining bytes.

Bucket: authentication_error (401)

missing_api_key

Status: 401 No credentials on the request. Confirm the Authorization: Bearer <key> header (or x-api-key: <key>) is present.

invalid_api_key

Status: 401 The key was sent but doesn’t match a live Wafer key. Common causes: the key was revoked (see API Keys), the key was copied with surrounding whitespace, or it was generated for a different environment. Pass keys and Serverless keys both start with wfr_.

Bucket: permission_error (403)

http_403

Status: 403 The key is valid but doesn’t have access to this endpoint or resource (e.g. a Pass key trying to hit a dedicated endpoint, or a key from a different account).

files_require_user_key

Status: 403 The bearer key is a non-user-scoped legacy key and cannot upload to the Files API. Only modern, user-scoped Serverless keys can own files. What to do: mint a fresh Serverless key from app.wafer.ai and use it for the upload. Older keys keep working for inference; only file ownership requires a user-scoped key.

Bucket: rate_limit_error (429)

429 responses include three standard headers SDKs read for backoff:
  • Retry-After — seconds until the next retry should be attempted
  • RateLimit-Limit — the cap for this window
  • RateLimit-Remaining0 when at-cap
  • RateLimit-Reset — seconds until the limit resets

concurrency_limit_exceeded

Status: 429 Too many in-flight requests on the account. Wait briefly (1–2s) and retry with exponential backoff. Retry-After: 1 is set as a hint.

request_quota_exceeded

Status: 429 The account hit its included-request limit for the current window. The body includes request_limit, window_end, and (when known) plan_tier:
{
  "error": {
    "type": "rate_limit_error",
    "code": "request_quota_exceeded",
    "message": "You have gone over your plan request limit ...",
    "request_limit": 500,
    "window_end": "2026-05-21T20:00:00+00:00",
    "plan_tier": "pro",
    "request_id": "req_..."
  }
}
plan_tier is omitted from the body when the account’s plan tier isn’t known at the edge — branch on key presence, not on null. What to do: wait for window_end, upgrade your plan, or enable overage in your dashboard.

rate_limit_exceeded

Status: 429 A general rate-limit hit (e.g. on a control-plane endpoint). Check Retry-After and back off.

backend_at_capacity

Status: 429 The per-route concurrency cap on the selected model was hit. This is a backpressure signal, not an account-level throttle — other accounts can also hit it simultaneously when a model is busy. This code also covers the upstream queue-full case: a backend that returns its own 503 with a “queue full” signal is rewritten to a 429 with this code so your SDK’s RateLimitError retry path takes over instead of treating it as a server outage. What to do: retry with exponential backoff. The condition typically clears in 1–2 seconds. Retry-After is set when the upstream provided one.

backend_overloaded

Status: 429 The selected backend reported itself overloaded with a 429 of its own. Distinct from backend_at_capacity (which is the edge-side cap) — this one originated upstream. What to do: retry with exponential backoff.

Bucket: insufficient_credits (402)

insufficient_credits

Status: 402 Your Wafer Serverless prepaid balance is insufficient for the request. The body includes an estimate of available vs. required credits:
{
  "error": {
    "type": "insufficient_credits",
    "code": "insufficient_credits",
    "message": "Your Wafer Serverless balance is approximately $0.12 ...",
    "credits_available_cents": 12,
    "credits_required_cents_estimate": 50,
    "topup_url": "https://app.wafer.ai/usage",
    "request_id": "req_..."
  }
}
What to do: add credits at topup_url, or enable auto-top-off in your dashboard.

Bucket: routing_error / server_error (502 / 503 / 504)

no_healthy_backends

Status: 503 All backends for the requested model are currently unavailable. Oncall is notified automatically when this fires. The error body includes the affected model. What to do: retry with exponential backoff. If the issue persists for more than a minute, check status.wafer.ai or contact support with your request_id.

backend_timeout

Status: 504 The selected backend accepted the request but didn’t produce a response in time. The error body includes the model. What to do: retry. Backends auto-recover; a single 504 is usually transient.

backend_connect_error

Status: 502 We couldn’t open a connection to the selected backend. What to do: retry — our router will pick a different backend on the next attempt.

backend_http_error

Status: 502 The backend returned an HTTP error we couldn’t interpret. Includes model. What to do: retry. If it persists, send us your request_id.

Bucket: internal_error (500)

internal_error

Status: 500 Something on our side went wrong and the cause didn’t fit any specific bucket. The request_id is the load-bearing piece — send it with your bug report and we can find the exact failure in our logs. What to do: retry once. If it reproduces, file a bug with the request_id and (if relevant) the request body.

SDK quickstart

Python (openai)

from openai import OpenAI, BadRequestError, RateLimitError

client = OpenAI(api_key="wfr_...", base_url="https://pass.wafer.ai/v1")

try:
    resp = client.chat.completions.create(
        model="GLM-5.1",
        messages=[{"role": "user", "content": "..."}],
    )
except BadRequestError as e:
    # e.body holds the unified envelope. Read the structured code.
    err = (e.body or {}).get("error", {})
    code = err.get("code")
    if code == "context_length_exceeded":
        # Try a suggested model
        for alt in err.get("suggested_models", []):
            print("retry with", alt["id"])
    elif code == "tool_schema_invalid":
        print("bad tool param:", err.get("param"))
    print("request_id:", err.get("request_id"))
except RateLimitError as e:
    # Honor Retry-After header from e.response.headers
    print("retry after:", e.response.headers.get("Retry-After"))

Python (anthropic)

from anthropic import Anthropic, APIStatusError

client = Anthropic(api_key="wfr_...", base_url="https://pass.wafer.ai")

try:
    resp = client.messages.create(
        model="GLM-5.1",
        max_tokens=1024,
        messages=[{"role": "user", "content": "..."}],
    )
except APIStatusError as e:
    err = (e.body or {}).get("error", {})
    print("code:", err.get("code"), "request_id:", err.get("request_id"))
When filing a support ticket, always include the request_id (from the x-request-id header or the error.request_id field). With that, we can look up the exact request in our logs.