Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt

Use this file to discover all available pages before exploring further.

When a request to Wafer fails, the response body carries a structured error envelope you can branch on:
{
  "error": {
    "type":       "invalid_request_error",
    "code":       "tool_schema_invalid",
    "message":    "tools[0].function.parameters must be a JSON Schema object",
    "param":      "tools[0].function.parameters",
    "request_id": "req_a1b2c3d4e5f6",
    "docs_url":   "https://docs.wafer.ai/errors#tool_schema_invalid"
  }
}
The same envelope ships from pass.wafer.ai (OpenAI- and Anthropic-compatible inference endpoints) and from api.wafer.ai (account, billing, and key management). Anthropic-compatible responses wrap the same fields under {"type":"error","error":{...}}. Every response — success or failure — also carries the x-request-id header. Include that ID when contacting support.
When you include an x-request-id header on your request, Wafer echoes it back instead of generating one, so you can correlate with your application logs. To keep server- and client-supplied IDs separate in our logs, Wafer prefixes client values with client_. Server-generated IDs match req_<12 hex> exactly; anything else you send is prefixed.

How to use this page

type is the coarse bucket — your SDK probably already maps it to a typed exception class (RateLimitError, BadRequestError, etc). code is the specific reason within that bucket and the right thing to branch on programmatically. Each anchor below documents one code.

Bucket: invalid_request_error (400 / 422)

schema_validation_failed

Status: 422 The request body didn’t match the endpoint’s schema. error.param points at the offending field path (e.g. messages[0].content, amount_cents). What to do: read error.param and error.message, fix the field, and retry.

tool_schema_invalid

Status: 400 A function tool you passed has an invalid parameters block. We pre-validate that parameters is a JSON Schema object (type: "object" with a properties dict) before forwarding, because the underlying backends 400 with unhelpful Pydantic errors. What to do: confirm tools[i].function.parameters matches the OpenAI function-calling spec.

tool_choice_unknown_tool

Status: 400 tool_choice.function.name doesn’t match any tool you declared in tools. What to do: ensure the tool name in tool_choice is one of the tools[].function.name values in the same request.

orphan_tool_message

Status: 400 A role=tool message references a tool_call_id that no preceding assistant message issued. Every tool message must follow an assistant message whose tool_calls[] contains a matching id. What to do: check that your conversation history is intact — every tool result must be preceded by the assistant turn that requested it.

missing_tool_call_id

Status: 400 A role=tool message is missing its tool_call_id field. The field is required so we can pair the result back to the assistant’s request.

unsupported_parameter

Status: 400 A request parameter is not supported on the endpoint or for the selected model (e.g. previous_response_id, logit_bias on some backends). What to do: remove the parameter, or check the Models page for per-model support.

unsupported_tool_type

Status: 400 Only tools[i].type == "function" is supported on the OpenAI-compatible endpoint. (Anthropic-side tool types are translated upstream.)

duplicate_tool_name

Status: 400 Two entries in tools declare the same function.name. Tool names must be unique within a request.

context_length_exceeded

Status: 400 The request would exceed the selected model’s context window. The error body includes structured fields to help you switch models programmatically:
{
  "error": {
    "type": "invalid_request_error",
    "code": "context_length_exceeded",
    "message": "The request exceeded GLM-5.1's context window (max_model_len=202752). Reduce the prompt length, compact your conversation, or lower max_tokens. Try a model with a larger context window: Qwen3.5-397B-A17B (256K context).",
    "model": "GLM-5.1",
    "context_length_limit": 202752,
    "suggested_models": [
      {"id": "Qwen3.5-397B-A17B", "context_length": 262144}
    ],
    "request_id": "req_..."
  }
}
What to do: lower max_tokens, trim the prompt, or retry against one of suggested_models.

model_not_found

Status: 404 The model value in your request doesn’t match any model your key can access. The error message includes the list of available models for your key tier.

Bucket: authentication_error (401)

http_401

Status: 401 Missing or invalid API key. Confirm the Authorization: Bearer <key> header is present and the key hasn’t been rotated. Pass keys start with wfr_; serverless keys start with wfr_.

Bucket: permission_error (403)

http_403

Status: 403 The key is valid but doesn’t have access to this endpoint or resource (e.g. a Pass key trying to hit a dedicated endpoint, or a key from a different account).

Bucket: rate_limit_error (429)

429 responses include three standard headers SDKs read for backoff:
  • Retry-After — seconds until the next retry should be attempted
  • RateLimit-Limit — the cap for this window
  • RateLimit-Remaining0 when at-cap
  • RateLimit-Reset — seconds until the limit resets

concurrency_limit_exceeded

Status: 429 Too many in-flight requests on the account. Wait briefly (1–2s) and retry with exponential backoff. Retry-After: 1 is set as a hint.

request_quota_exceeded

Status: 429 The account hit its included-request limit for the current window. The body includes request_limit, window_end, and (when known) plan_tier:
{
  "error": {
    "type": "rate_limit_error",
    "code": "request_quota_exceeded",
    "message": "You have gone over your plan request limit ...",
    "request_limit": 500,
    "window_end": "2026-05-21T20:00:00+00:00",
    "plan_tier": "pro",
    "request_id": "req_..."
  }
}
plan_tier is omitted from the body when the account’s plan tier isn’t known at the edge — branch on key presence, not on null. What to do: wait for window_end, upgrade your plan, or enable overage in your dashboard.

rate_limit_exceeded

Status: 429 A general rate-limit hit (e.g. on a control-plane endpoint). Check Retry-After and back off.

Bucket: insufficient_credits (402)

insufficient_credits

Status: 402 Your Wafer Serverless prepaid balance is insufficient for the request. The body includes an estimate of available vs. required credits:
{
  "error": {
    "type": "insufficient_credits",
    "code": "insufficient_credits",
    "message": "Your Wafer Serverless balance is approximately $0.12 ...",
    "credits_available_cents": 12,
    "credits_required_cents_estimate": 50,
    "topup_url": "https://app.wafer.ai/usage",
    "request_id": "req_..."
  }
}
What to do: add credits at topup_url, or enable auto-top-off in your dashboard.

Bucket: routing_error / server_error (502 / 503 / 504)

no_healthy_backends

Status: 503 All backends for the requested model are currently unavailable. Oncall is notified automatically when this fires. The error body includes the affected model. What to do: retry with exponential backoff. If the issue persists for more than a minute, check status.wafer.ai or contact support with your request_id.

backend_timeout

Status: 504 The selected backend accepted the request but didn’t produce a response in time. The error body includes the model. What to do: retry. Backends auto-recover; a single 504 is usually transient.

backend_connect_error

Status: 502 We couldn’t open a connection to the selected backend. What to do: retry — our router will pick a different backend on the next attempt.

backend_http_error

Status: 502 The backend returned an HTTP error we couldn’t interpret. Includes model. What to do: retry. If it persists, send us your request_id.

Bucket: internal_error (500)

internal_error

Status: 500 Something on our side went wrong and the cause didn’t fit any specific bucket. The request_id is the load-bearing piece — send it with your bug report and we can find the exact failure in our logs. What to do: retry once. If it reproduces, file a bug with the request_id and (if relevant) the request body.

SDK quickstart

Python (openai)

from openai import OpenAI, BadRequestError, RateLimitError

client = OpenAI(api_key="wfr_...", base_url="https://pass.wafer.ai/v1")

try:
    resp = client.chat.completions.create(
        model="GLM-5.1",
        messages=[{"role": "user", "content": "..."}],
    )
except BadRequestError as e:
    # e.body holds the unified envelope. Read the structured code.
    err = (e.body or {}).get("error", {})
    code = err.get("code")
    if code == "context_length_exceeded":
        # Try a suggested model
        for alt in err.get("suggested_models", []):
            print("retry with", alt["id"])
    elif code == "tool_schema_invalid":
        print("bad tool param:", err.get("param"))
    print("request_id:", err.get("request_id"))
except RateLimitError as e:
    # Honor Retry-After header from e.response.headers
    print("retry after:", e.response.headers.get("Retry-After"))

Python (anthropic)

from anthropic import Anthropic, APIStatusError

client = Anthropic(api_key="wfr_...", base_url="https://pass.wafer.ai")

try:
    resp = client.messages.create(
        model="GLM-5.1",
        max_tokens=1024,
        messages=[{"role": "user", "content": "..."}],
    )
except APIStatusError as e:
    err = (e.body or {}).get("error", {})
    print("code:", err.get("code"), "request_id:", err.get("request_id"))
When filing a support ticket, always include the request_id (from the x-request-id header or the error.request_id field). With that, we can look up the exact request in our logs.