Skip to main content
Use the Serverless API directly when you are building against Wafer from your own application, scripts, or low-level tooling. For Claude Code, Codex, Cline, Roo Code, and other agent harnesses, use Agent Setup instead.

Base URL

SurfaceURL
OpenAI-compatible APIhttps://pass.wafer.ai/v1
Anthropic-compatible Messages APIhttps://pass.wafer.ai/v1/messages
Send your API key on every request:
Authorization: Bearer <YOUR_WAFER_API_KEY>
To require Zero Data Retention for a single request, add:
Wafer-ZDR: required

List Models

curl -sS "https://pass.wafer.ai/v1/models" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>"
The model list is the source of truth for currently available Serverless model IDs. Each model card includes capabilities and pricing metadata.

Chat Completions

Use POST /v1/chat/completions for ordinary text prompts and OpenAI-compatible clients:
curl -sS "https://pass.wafer.ai/v1/chat/completions" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1",
    "messages": [
      {"role": "user", "content": "Reply with the single word: ready."}
    ],
    "max_tokens": 16,
    "temperature": 0
  }'
Add Wafer-ZDR: required when the request must only route to ZDR-capable infrastructure:
curl -sS "https://pass.wafer.ai/v1/chat/completions" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>" \
  -H "Wafer-ZDR: required" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-397B-A17B",
    "messages": [{"role": "user", "content": "Summarize what Wafer does."}],
    "max_tokens": 128
  }'

Streaming

Set stream to true and add -N to receive server-sent events as they arrive:
curl -N -sS "https://pass.wafer.ai/v1/chat/completions" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1",
    "messages": [{"role": "user", "content": "Write a one-sentence haiku."}],
    "max_tokens": 64,
    "temperature": 0.7,
    "stream": true
  }'

Chat Request Body

FieldTypeRequiredNotes
modelstringYesAny Serverless model ID from GET /v1/models, such as GLM-5.1 or Qwen3.5-397B-A17B.
messagesarrayYesOpenAI-compatible chat messages with role and content.
max_tokensintegerNoMaximum generated tokens. Must be positive when provided.
temperaturenumberNoSampling temperature. Use 0 for deterministic decoding.
top_pnumberNoNucleus sampling cutoff.
top_kintegerNoLimits sampling to the top K candidate tokens on supported models.
min_pnumberNoMinimum probability threshold on supported models.
frequency_penaltynumberNoPenalizes repeated tokens by frequency.
presence_penaltynumberNoPenalizes tokens that have already appeared.
repetition_penaltynumberNoSGLang repetition penalty on supported models.
stopstring or arrayNoStop sequence or sequences.
streambooleanNoWhen true, returns streaming chat completion chunks.
toolsarrayNoOpenAI-compatible tool definitions on models that support tool calling.
tool_choicestring or objectNoControls tool selection for compatible models.
response_formatobjectNoUse JSON mode or structured outputs on compatible models.
logprobsbooleanNoRequest token log probabilities on compatible models.
top_logprobsintegerNoNumber of log probabilities to include when logprobs is enabled.
Unsupported or model-specific parameters return a request error instead of being silently ignored.

Text Completions

Use POST /v1/completions only when you need token-ID prompts or constrained decoding on a supported route:
curl -sS "https://pass.wafer.ai/v1/completions" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1",
    "prompt": [9703],
    "max_tokens": 2,
    "temperature": 0,
    "ebnf": "root ::= \"A\" | \"B\""
  }'
For the full /v1/completions request shape, streaming example, parameter table, and response shape, see Tokenized Completions and Constrained Decoding.

Anthropic Messages

Wafer also exposes an Anthropic-compatible Messages endpoint at https://pass.wafer.ai/v1/messages. Most users reach it through Claude Code or Conductor; see Agent Setup for the required environment variables.