Base URL
| Surface | URL |
|---|---|
| OpenAI-compatible API | https://pass.wafer.ai/v1 |
| Anthropic-compatible Messages API | https://pass.wafer.ai/v1/messages |
List Models
Chat Completions
UsePOST /v1/chat/completions for ordinary text prompts and OpenAI-compatible clients:
Wafer-ZDR: required when the request must only route to ZDR-capable infrastructure:
Streaming
Setstream to true and add -N to receive server-sent events as they arrive:
Chat Request Body
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | Yes | Any Serverless model ID from GET /v1/models, such as GLM-5.1 or Qwen3.5-397B-A17B. |
messages | array | Yes | OpenAI-compatible chat messages with role and content. |
max_tokens | integer | No | Maximum generated tokens. Must be positive when provided. |
temperature | number | No | Sampling temperature. Use 0 for deterministic decoding. |
top_p | number | No | Nucleus sampling cutoff. |
top_k | integer | No | Limits sampling to the top K candidate tokens on supported models. |
min_p | number | No | Minimum probability threshold on supported models. |
frequency_penalty | number | No | Penalizes repeated tokens by frequency. |
presence_penalty | number | No | Penalizes tokens that have already appeared. |
repetition_penalty | number | No | SGLang repetition penalty on supported models. |
stop | string or array | No | Stop sequence or sequences. |
stream | boolean | No | When true, returns streaming chat completion chunks. |
tools | array | No | OpenAI-compatible tool definitions on models that support tool calling. |
tool_choice | string or object | No | Controls tool selection for compatible models. |
response_format | object | No | Use JSON mode or structured outputs on compatible models. |
logprobs | boolean | No | Request token log probabilities on compatible models. |
top_logprobs | integer | No | Number of log probabilities to include when logprobs is enabled. |
Text Completions
UsePOST /v1/completions only when you need token-ID prompts or constrained decoding on a supported route:
/v1/completions request shape, streaming example, parameter table, and response shape, see Tokenized Completions and Constrained Decoding.
Anthropic Messages
Wafer also exposes an Anthropic-compatible Messages endpoint athttps://pass.wafer.ai/v1/messages. Most users reach it through Claude Code or Conductor; see Agent Setup for the required environment variables.