Documentation Index
Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt
Use this file to discover all available pages before exploring further.
Qwen3.5-397B-A17B and GLM-5.1, served at multiples of the speed of generic inference providers. More models land on the same subscription — no price increase.
Wafer Pass is built for Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses. Wafer exposes both an OpenAI-compatible endpoint and an Anthropic-compatible Messages endpoint, so tools like Claude Code work out of the box.
Get a Wafer Pass for fast open-source models through a standard API endpoint. Plans start at $10/week.
Get your Wafer Pass: https://www.wafer.ai/pass
Connection Details
Use your Wafer Pass API key with these values:| OpenAI-compatible endpoint | https://pass.wafer.ai/v1 |
| Anthropic-compatible endpoint | https://pass.wafer.ai/v1/messages |
| Send your API key as | Authorization: Bearer <key> (wfr_… Pass keys work here) |
| Request-scoped ZDR | Add Wafer-ZDR: required on direct API calls |
model strings to pass on the OpenAI-compatible endpoint.
ANTHROPIC_BASE_URL=https://pass.wafer.ai and ANTHROPIC_API_KEY to your Wafer key — Claude Code will hit /v1/messages automatically. To make sure Claude Code talks to a Wafer model, set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL to one of the Wafer model IDs in Models (e.g. GLM-5.1). All other harnesses (OpenClaw, Cline, Roo Code, etc.) use the OpenAI-compatible endpoint at https://pass.wafer.ai/v1.What’s Included
With an active Wafer Pass subscription you get:Qwen3.5-397B-A17BandGLM-5.1requests included in your plan’s 5-hour window limit at zero per-token cost; overage is billed only for models with enabled per-token rates- Access through a standard OpenAI-compatible API and an Anthropic-compatible Messages API using your Wafer API key
- Works with Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses
- New fast models as we release them — same subscription, no price increase
Models
model string | Family | Max context tokens | ZDR support | Notes |
|---|---|---|---|---|
Qwen3.5-397B-A17B | Qwen3.5, 397B MoE | 262144 | Yes | Multiples faster than base SGLang on Wafer’s stack |
GLM-5.1 | Z.AI flagship | 202752 | Yes |
400. If your harness has a Context Window Size setting (Cline, Roo Code, Droid, etc.), set it to the exact integer above. Leave ~2–4k of headroom for the model’s response when filling context.
Pass any model string above to any OpenAI-compatible harness configured against https://pass.wafer.ai/v1. Model names are case-insensitive — GLM-5.1 and glm-5.1 both work. If you pass a model name that doesn’t match any available model, the API returns a 404 with the list of available models.
For Claude Code and other Anthropic-compatible harnesses, set ANTHROPIC_DEFAULT_*_MODEL to one of the model IDs above. See Set Up Claude Code.
API Capabilities
Wafer Pass supports account-aware privacy enforcement and advanced completion controls:Pricing
Pay weekly, monthly, or save 20% off the weekly rate with yearly billing.Weekly
| Plan | For | Price | Requests / 5hr window | Includes |
|---|---|---|---|---|
| Starter | Solo devs, daily agents | $10/wk | 1,000 | Access to every model Wafer hosts |
| Privacy | Production agents, private workloads | $25/wk | 2,000 | Zero Data Retention |
Monthly
| Plan | For | Price | Requests / 5hr window | Includes |
|---|---|---|---|---|
| Starter | Solo devs, daily agents | $40/mo | 1,000 | Access to every model Wafer hosts |
| Privacy | Production agents, private workloads | $100/mo | 2,000 | Zero Data Retention |
Yearly (20% off the weekly rate)
| Plan | Price | Effective weekly | Requests / 5hr window |
|---|---|---|---|
| Starter | $416/yr | $8/wk | 1,000 |
| Privacy | $1040/yr | $20/wk | 2,000 |
Overage Pricing
Requests beyond your plan’s included 5-hour window limit are billed at per-model API rates for models with overage enabled. All users pay the same overage rate regardless of plan tier.| Model | Input | Output | Cached Input |
|---|---|---|---|
Qwen3.5-397B-A17B | $0.60/M tokens | $3.60/M tokens | $0.06/M tokens |
GLM-5.1 | $1.50/M tokens | $4.50/M tokens | $0.15/M tokens |
Getting Started
Pick a plan
Get your API key
Set Up Claude Code
Wafer exposes an Anthropic-compatible Messages endpoint athttps://pass.wafer.ai/v1/messages, so Claude Code can connect directly — no proxy needed.
For Claude Code, set ANTHROPIC_BASE_URL to https://pass.wafer.ai, not https://pass.wafer.ai/v1.
Configure Wafer as the endpoint
~/.zshrc, ~/.bashrc, etc.):~/.claude/settings.json for a persistent, per-user config:YOUR_WAFER_API_KEY with your Wafer Pass API key.Pin a Wafer model
claude-opus-…, claude-sonnet-…) by default — those don’t match a Wafer model. Pin Claude Code to a Wafer model with these env vars:For Qwen3.5-397B-A17B:GLM-5.1:~/.claude/settings.json:Set Up Conductor
Conductor runs a team of parallel Claude Code agents in isolated Git worktrees on macOS. Because Conductor uses Claude Code under the hood, it picks up Wafer the same way Claude Code does — pin the model to any Wafer Pass ID from Models.Install Conductor
Configure Wafer as the endpoint
~/.zshrc), set:YOUR_WAFER_API_KEY with your Wafer Pass API key.Pin a Wafer model
Qwen3.5-397B-A17B:GLM-5.1:Set Up Codex
Codex (the OpenAI Codex CLI) only speaks the OpenAI Responses API (/v1/responses). The legacy wire_api = "chat" setting was deprecated in December 2025 and removed in February 2026, so Codex can no longer talk to a Chat Completions endpoint directly. Wafer Pass is OpenAI Chat Completions–compatible (/v1/chat/completions), so to use it from Codex you run a tiny local LiteLLM proxy that translates Responses API requests into Chat Completions on the way to Wafer. This is the path OpenAI’s own Codex deprecation notice points users at.
/v1/responses requests to LiteLLM on localhost:4000, LiteLLM rewrites them as /v1/chat/completions against https://pass.wafer.ai/v1, and the response stream is translated back into Responses-API SSE events. You only set this up once.Install Codex
wire_api = "responses" is the only supported value as of 0.92.0+):Create a LiteLLM proxy config
litellm_config.yaml in a directory of your choice (e.g. ~/.codex/litellm_config.yaml):openai/ prefix tells LiteLLM to call Wafer over the OpenAI Chat Completions wire format. drop_params: true lets LiteLLM silently drop Responses-only fields that don’t have a Chat Completions equivalent.Start the LiteLLM proxy
- Docker (recommended)
- pip
127.0.0.1: prefix on -p keeps the proxy reachable only from your machine. The container itself listens on 0.0.0.0 inside the container so Docker’s port forward works.200 response with an output array means the bridge is healthy.Point Codex at the local proxy
~/.codex/config.toml:env_key tells Codex which env var to read for the bearer token it sends to the local proxy. LiteLLM accepts that token (no proxy-side auth is configured here) and uses its own upstream api_key: os.environ/WAFER_API_KEY from litellm_config.yaml to call Wafer. Both ultimately read the same WAFER_API_KEY env var, so the call to Wafer is authenticated with your Wafer Pass key.model = "..." to any model_name from your litellm_config.yaml (e.g. GLM-5.1 or Qwen3.5-397B-A17B).wire_api = "chat" removal means every third-party OpenAI-compatible provider that doesn’t natively expose /v1/responses needs a translation layer right now. We track Wafer adding a native Responses endpoint on the Wafer Pass roadmap; when it ships, you’ll be able to drop the LiteLLM step and point Codex straight at https://pass.wafer.ai/v1.Set Up OpenClaw
GLM-5.1. Swap in any Models ID (Qwen3.5-397B-A17B or GLM-5.1). Applies to every OpenAI-compatible setup section that follows (OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and the generic section at the bottom).Set Up Hermes Agent
Set Up Cline
Install Cline
Configure Wafer as a provider
- Open VS Code and click the Cline icon in the sidebar
- Click the settings gear icon in the Cline panel
- In the API Provider dropdown, select OpenAI Compatible
- Fill in these fields:
- Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
- Model ID:
Qwen3.5-397B-A17B
Set model info (recommended)
- Context Window Size:
262144 - Max Output Tokens:
32768 - Supports Images: unchecked
Set Up Roo Code
Install Roo Code
Configure Wafer as a provider
- Open VS Code and click the Roo Code icon in the sidebar
- Click the settings gear icon in the Roo Code panel
- In the API Provider dropdown, select OpenAI Compatible
- Fill in these fields:
- Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
- Model ID:
Qwen3.5-397B-A17B
Set model info (recommended)
- Context Window Size:
262144 - Max Output Tokens:
32768
Set Up Kilo Code
Install Kilo Code
Configure Wafer as a provider
- Open Kilo Code and click the settings gear icon
- Go to the Providers tab
- Click Custom provider at the bottom
- Fill in the dialog:
- Provider ID:
wafer - Display Name:
Wafer - Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
- Model:
Qwen3.5-397B-A17B
- Click Save
Set Up OpenHands
Install OpenHands
Configure Wafer as the LLM (UI)
- Open the OpenHands UI (usually at
http://localhost:3000) - Click the settings gear icon
- Click Advanced to expand advanced options
- Set these fields:
- Custom Model:
openai/GLM-5.1 - Base URL:
https://pass.wafer.ai/v1 - API Key: your Wafer API key
Alternative: config.toml
config.toml in the project root:Set Up LibreChat
Install LibreChat
Add Wafer as a provider
librechat.yaml in the project root and add a Wafer endpoint:.env file:YOUR_WAFER_API_KEY with your Wafer Pass API key.default list above exposes all Wafer models in LibreChat’s model picker. Drop any IDs you don’t want to surface.Use Wafer with Other Harnesses
Most agent harnesses only need these settings: OpenAI-compatible harnesses (Cline, Roo Code, Kilo Code, OpenClaw, OpenHands, etc.):- Base URL:
https://pass.wafer.ai/v1 - Model: any ID from Models, e.g.
GLM-5.1orQwen3.5-397B-A17B - Authentication: your Wafer Pass key (same token in
Authorization: Bearer …headers or in the client’s API-key field — keys look likewfr_…) - Compatibility mode: OpenAI-compatible / OpenAI API
model string from Models.
Anthropic-compatible harnesses (Claude Code, Conductor, or any tool using the Anthropic Messages API):
- Base URL:
https://pass.wafer.ai(the tool appends/v1/messagesautomatically) - Authentication: your Wafer API key via
ANTHROPIC_API_KEY - Model: pin a Wafer model via
ANTHROPIC_DEFAULT_*_MODEL(the harness’s default Anthropic model strings don’t match a Wafer model)
Wafer. Pass your Wafer Pass key wherever the harness expects an Anthropic/API key (ANTHROPIC_API_KEY).
Terms of Use
Wafer Pass is intended for personal agentic coding use only. By purchasing Wafer Pass you agree to the following:- Allowed: Personal development, experimentation, and coding with agentic harnesses (Claude Code, OpenCode, Cline, Kilo Code, OpenClaw, LangChain Deep Agents, and similar tools).
- Prohibited: Production workloads, team or shared usage (one account per person, max 3 concurrent in-flight requests), reselling or pooling access, and any use that violates the Wafer Terms of Service.
FAQ
What models do I get?
What models do I get?
GLM-5.1 and Qwen3.5-397B-A17B. More models ship on the same subscription as we add them.Can I use Wafer Pass with any model?
Can I use Wafer Pass with any model?
https://pass.wafer.ai/v1.Can I share my subscription?
Can I share my subscription?
How do I get access?
How do I get access?
Do I need a special model ID?
Do I need a special model ID?
model string from Models with https://pass.wafer.ai/v1. Model names are case-insensitive for the IDs above. For Claude Code (Anthropic-compatible), pin a Wafer model via ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL / CLAUDE_CODE_SUBAGENT_MODEL.Why does Codex need a LiteLLM proxy?
Why does Codex need a LiteLLM proxy?
wire_api = "chat" is no longer accepted) and now requires the OpenAI Responses API (/v1/responses). Wafer Pass exposes Chat Completions (/v1/chat/completions), so the local LiteLLM proxy translates between the two. See Set Up Codex for the config. The proxy is only needed for Codex — every other harness on this page (Claude Code, Conductor, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, LibreChat) talks to Wafer directly.Will more models be added?
Will more models be added?