Documentation Index
Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt
Use this file to discover all available pages before exploring further.
Qwen3.5-397B-A17B, GLM-5.1, Qwen3.6-35B-A3B, Kimi-K2.6, and more — served at multiples of the speed of generic inference providers.
Wafer Serverless is built for Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses. Wafer exposes both an OpenAI-compatible endpoint and an Anthropic-compatible Messages endpoint, so tools like Claude Code work out of the box.
Get a Wafer Serverless API key for fast open-source models through a standard API endpoint.
Get started: https://www.wafer.ai/serverless/checkout
Connection Details
Use your Wafer Serverless API key with these values:| OpenAI-compatible endpoint | https://pass.wafer.ai/v1 |
| Anthropic-compatible endpoint | https://pass.wafer.ai/v1/messages |
| Send your API key as | Authorization: Bearer <key> (wfr_… Serverless keys work here) |
| Request-scoped ZDR | Add Wafer-ZDR: required on direct API calls |
model strings to pass on the OpenAI-compatible endpoint.
ANTHROPIC_BASE_URL=https://pass.wafer.ai and ANTHROPIC_API_KEY to your Wafer key — Claude Code will hit /v1/messages automatically. To make sure Claude Code talks to a Wafer model, set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL to one of the Wafer model IDs in Models (e.g. GLM-5.1). All other harnesses (OpenClaw, Cline, Roo Code, etc.) use the OpenAI-compatible endpoint at https://pass.wafer.ai/v1.What’s Included
With a Wafer Serverless API key you get:- Pay-as-you-go access to Wafer-hosted models, billed per token
- Access through a standard OpenAI-compatible API and an Anthropic-compatible Messages API using your Wafer API key
- Works with Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses
- New fast models as we release them
Models
model string | Family | Max context tokens | ZDR support | Notes |
|---|---|---|---|---|
Qwen3.5-397B-A17B | Qwen3.5, 397B MoE | 262144 | Yes | Multiples faster than base SGLang on Wafer’s stack |
GLM-5.1 | Z.AI flagship | 202752 | Yes | |
Qwen3.6-35B-A3B | Qwen3.6, 35B MoE | 32768 | Yes | Serverless-only |
Kimi-K2.6 | Moonshot Kimi K2.6 | 262144 | Yes | Serverless-only |
400. If your harness has a Context Window Size setting (Cline, Roo Code, Droid, etc.), set it to the exact integer above. Leave ~2–4k of headroom for the model’s response when filling context.
Pass any model string above to any OpenAI-compatible harness configured against https://pass.wafer.ai/v1. Model names are case-insensitive — GLM-5.1 and glm-5.1 both work. If you pass a model name that doesn’t match any available model, the API returns a 404 with the list of available models.
For Claude Code and other Anthropic-compatible harnesses, set ANTHROPIC_DEFAULT_*_MODEL to one of the model IDs above. See Set Up Claude Code.
API Capabilities
Wafer Serverless supports request-scoped privacy enforcement and advanced completion controls:Getting Started
Pick a plan
Get your API key
Set Up Claude Code
Wafer exposes an Anthropic-compatible Messages endpoint athttps://pass.wafer.ai/v1/messages, so Claude Code can connect directly — no proxy needed.
For Claude Code, set ANTHROPIC_BASE_URL to https://pass.wafer.ai, not https://pass.wafer.ai/v1.
Configure Wafer as the endpoint
~/.zshrc, ~/.bashrc, etc.):~/.claude/settings.json for a persistent, per-user config:YOUR_WAFER_API_KEY with your Wafer Serverless API key.Pin a Wafer model
claude-opus-…, claude-sonnet-…) by default — those don’t match a Wafer model. Pin Claude Code to a Wafer model with these env vars:For Qwen3.5-397B-A17B:GLM-5.1:~/.claude/settings.json:Set Up Conductor
Conductor runs a team of parallel Claude Code agents in isolated Git worktrees on macOS. Because Conductor uses Claude Code under the hood, it picks up Wafer the same way Claude Code does — pin the model to any Wafer model ID from Models.Install Conductor
Configure Wafer as the endpoint
~/.zshrc), set:YOUR_WAFER_API_KEY with your Wafer Serverless API key.Pin a Wafer model
Qwen3.5-397B-A17B:GLM-5.1:Set Up Codex
Codex (the OpenAI Codex CLI) only speaks the OpenAI Responses API (/v1/responses). The legacy wire_api = "chat" setting was deprecated in December 2025 and removed in February 2026, so Codex can no longer talk to a Chat Completions endpoint directly. Wafer Serverless is OpenAI Chat Completions–compatible (/v1/chat/completions), so to use it from Codex you run a tiny local LiteLLM proxy that translates Responses API requests into Chat Completions on the way to Wafer. This is the path OpenAI’s own Codex deprecation notice points users at.
/v1/responses requests to LiteLLM on localhost:4000, LiteLLM rewrites them as /v1/chat/completions against https://pass.wafer.ai/v1, and the response stream is translated back into Responses-API SSE events. You only set this up once.Install Codex
wire_api = "responses" is the only supported value as of 0.92.0+):Create a LiteLLM proxy config
litellm_config.yaml in a directory of your choice (e.g. ~/.codex/litellm_config.yaml):openai/ prefix tells LiteLLM to call Wafer over the OpenAI Chat Completions wire format. drop_params: true lets LiteLLM silently drop Responses-only fields that don’t have a Chat Completions equivalent.Start the LiteLLM proxy
- Docker (recommended)
- pip
127.0.0.1: prefix on -p keeps the proxy reachable only from your machine. The container itself listens on 0.0.0.0 inside the container so Docker’s port forward works.200 response with an output array means the bridge is healthy.Point Codex at the local proxy
~/.codex/config.toml:env_key tells Codex which env var to read for the bearer token it sends to the local proxy. LiteLLM accepts that token (no proxy-side auth is configured here) and uses its own upstream api_key: os.environ/WAFER_API_KEY from litellm_config.yaml to call Wafer. Both ultimately read the same WAFER_API_KEY env var, so the call to Wafer is authenticated with your Wafer Serverless API key.model = "..." to any model_name from your litellm_config.yaml (e.g. GLM-5.1 or Qwen3.5-397B-A17B).wire_api = "chat" removal means every third-party OpenAI-compatible provider that doesn’t natively expose /v1/responses needs a translation layer right now. We track Wafer adding a native Responses endpoint on the Wafer roadmap; when it ships, you’ll be able to drop the LiteLLM step and point Codex straight at https://pass.wafer.ai/v1.Set Up OpenClaw
GLM-5.1. Swap in any Models ID (Qwen3.5-397B-A17B or GLM-5.1). Applies to every OpenAI-compatible setup section that follows (OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and the generic section at the bottom).Set Up Hermes Agent
Set Up Cline
Install Cline
Configure Wafer as a provider
- Open VS Code and click the Cline icon in the sidebar
- Click the settings gear icon in the Cline panel
- In the API Provider dropdown, select OpenAI Compatible
- Fill in these fields:
- Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
- Model ID:
Qwen3.5-397B-A17B
Set model info (recommended)
- Context Window Size:
262144 - Max Output Tokens:
32768 - Supports Images: unchecked
Set Up Roo Code
Install Roo Code
Configure Wafer as a provider
- Open VS Code and click the Roo Code icon in the sidebar
- Click the settings gear icon in the Roo Code panel
- In the API Provider dropdown, select OpenAI Compatible
- Fill in these fields:
- Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
- Model ID:
Qwen3.5-397B-A17B
Set model info (recommended)
- Context Window Size:
262144 - Max Output Tokens:
32768
Set Up Kilo Code
Install Kilo Code
Configure Wafer as a provider
- Open Kilo Code and click the settings gear icon
- Go to the Providers tab
- Click Custom provider at the bottom
- Fill in the dialog:
- Provider ID:
wafer - Display Name:
Wafer - Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
- Model:
Qwen3.5-397B-A17B
- Click Save
Set Up OpenHands
Install OpenHands
Configure Wafer as the LLM (UI)
- Open the OpenHands UI (usually at
http://localhost:3000) - Click the settings gear icon
- Click Advanced to expand advanced options
- Set these fields:
- Custom Model:
openai/GLM-5.1 - Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
Alternative: config.toml
config.toml in the project root:Set Up LibreChat
Install LibreChat
Add Wafer as a provider
librechat.yaml in the project root and add a Wafer endpoint:.env file:YOUR_WAFER_API_KEY with your Wafer Serverless API key.default list above exposes all Wafer models in LibreChat’s model picker. Drop any IDs you don’t want to surface.Use Wafer with Other Harnesses
Most agent harnesses only need these settings: OpenAI-compatible harnesses (Cline, Roo Code, Kilo Code, OpenClaw, OpenHands, etc.):- Base URL:
https://pass.wafer.ai/v1 - Model: any ID from Models, e.g.
GLM-5.1orQwen3.5-397B-A17B - Authentication: your Wafer Serverless API key (same token in
Authorization: Bearer …headers or in the client’s API-key field — keys look likewfr_…) - Compatibility mode: OpenAI-compatible / OpenAI API
model string from Models.
Anthropic-compatible harnesses (Claude Code, Conductor, or any tool using the Anthropic Messages API):
- Base URL:
https://pass.wafer.ai(the tool appends/v1/messagesautomatically) - Authentication: your Wafer API key via
ANTHROPIC_API_KEY - Model: pin a Wafer model via
ANTHROPIC_DEFAULT_*_MODEL(the harness’s default Anthropic model strings don’t match a Wafer model)
Wafer. Pass your Wafer Serverless API key wherever the harness expects an Anthropic/API key (ANTHROPIC_API_KEY).
FAQ
What models do I get?
What models do I get?
GET https://pass.wafer.ai/v1/models for live availability and per-model pricing.Can I use Wafer Serverless with any model?
Can I use Wafer Serverless with any model?
https://pass.wafer.ai/v1.How do I get access?
How do I get access?
Do I need a special model ID?
Do I need a special model ID?
model string from Models with https://pass.wafer.ai/v1. Model names are case-insensitive for the IDs above. For Claude Code (Anthropic-compatible), pin a Wafer model via ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL / CLAUDE_CODE_SUBAGENT_MODEL.Why does Codex need a LiteLLM proxy?
Why does Codex need a LiteLLM proxy?
wire_api = "chat" is no longer accepted) and now requires the OpenAI Responses API (/v1/responses). Wafer Serverless exposes Chat Completions (/v1/chat/completions), so the local LiteLLM proxy translates between the two. See Set Up Codex for the config. The proxy is only needed for Codex — every other harness on this page (Claude Code, Conductor, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, LibreChat) talks to Wafer directly.Will more models be added?
Will more models be added?