Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt

Use this file to discover all available pages before exploring further.

Features, model availability, rate limits, and pricing may change as we iterate. Questions? Email emilio@wafer.ai.
Set up with: Claude Code · Conductor · Codex · OpenClaw · Hermes Agent · Cline · Roo Code · Kilo Code · OpenHands · LibreChat Wafer builds AI that optimizes AI. We take open models and make them dramatically faster. Wafer Pass gives you Qwen3.5-397B-A17B and GLM-5.1, served at multiples of the speed of generic inference providers. More models land on the same subscription — no price increase. Wafer Pass is built for Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses. Wafer exposes both an OpenAI-compatible endpoint and an Anthropic-compatible Messages endpoint, so tools like Claude Code work out of the box. Get a Wafer Pass for fast open-source models through a standard API endpoint. Plans start at $10/week.
Get your Wafer Pass: https://www.wafer.ai/pass

Connection Details

Use your Wafer Pass API key with these values:
OpenAI-compatible endpointhttps://pass.wafer.ai/v1
Anthropic-compatible endpointhttps://pass.wafer.ai/v1/messages
Send your API key asAuthorization: Bearer <key> (wfr_… Pass keys work here)
Request-scoped ZDRAdd Wafer-ZDR: required on direct API calls
See Models below for the model strings to pass on the OpenAI-compatible endpoint.
Claude Code uses the Anthropic Messages endpoint. Set ANTHROPIC_BASE_URL=https://pass.wafer.ai and ANTHROPIC_API_KEY to your Wafer key — Claude Code will hit /v1/messages automatically. To make sure Claude Code talks to a Wafer model, set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL to one of the Wafer model IDs in Models (e.g. GLM-5.1). All other harnesses (OpenClaw, Cline, Roo Code, etc.) use the OpenAI-compatible endpoint at https://pass.wafer.ai/v1.

What’s Included

With an active Wafer Pass subscription you get:
  • Qwen3.5-397B-A17B and GLM-5.1 requests included in your plan’s 5-hour window limit at zero per-token cost; overage is billed only for models with enabled per-token rates
  • Access through a standard OpenAI-compatible API and an Anthropic-compatible Messages API using your Wafer API key
  • Works with Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses
  • New fast models as we release them — same subscription, no price increase

Models

model stringFamilyMax context tokensZDR supportNotes
Qwen3.5-397B-A17BQwen3.5, 397B MoE262144YesMultiples faster than base SGLang on Wafer’s stack
GLM-5.1Z.AI flagship202752Yes
The Max context tokens value is the hard cap enforced by the backend — requests where prompt tokens exceed this value return a 400. If your harness has a Context Window Size setting (Cline, Roo Code, Droid, etc.), set it to the exact integer above. Leave ~2–4k of headroom for the model’s response when filling context. Pass any model string above to any OpenAI-compatible harness configured against https://pass.wafer.ai/v1. Model names are case-insensitiveGLM-5.1 and glm-5.1 both work. If you pass a model name that doesn’t match any available model, the API returns a 404 with the list of available models. For Claude Code and other Anthropic-compatible harnesses, set ANTHROPIC_DEFAULT_*_MODEL to one of the model IDs above. See Set Up Claude Code.

API Capabilities

Wafer Pass supports account-aware privacy enforcement and advanced completion controls:

Pricing

Pay weekly, monthly, or save 20% off the weekly rate with yearly billing.

Weekly

PlanForPriceRequests / 5hr windowIncludes
StarterSolo devs, daily agents$10/wk1,000Access to every model Wafer hosts
PrivacyProduction agents, private workloads$25/wk2,000Zero Data Retention

Monthly

PlanForPriceRequests / 5hr windowIncludes
StarterSolo devs, daily agents$40/mo1,000Access to every model Wafer hosts
PrivacyProduction agents, private workloads$100/mo2,000Zero Data Retention

Yearly (20% off the weekly rate)

PlanPriceEffective weeklyRequests / 5hr window
Starter$416/yr$8/wk1,000
Privacy$1040/yr$20/wk2,000

Overage Pricing

Requests beyond your plan’s included 5-hour window limit are billed at per-model API rates for models with overage enabled. All users pay the same overage rate regardless of plan tier.
ModelInputOutputCached Input
Qwen3.5-397B-A17B$0.60/M tokens$3.60/M tokens$0.06/M tokens
GLM-5.1$1.50/M tokens$4.50/M tokens$0.15/M tokens
Overage charges are calculated per 5-hour window and added to your next invoice.

Getting Started

1

Pick a plan

Go to wafer.ai/pass and choose your plan and billing interval. Checkout is self-serve and instant.
2

Get your API key

Your API key is shown on the success page right after checkout. We also email a backup copy to the address you used.
3

Start coding

Use the key in Claude Code, Codex, Conductor, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, or any other supported harness.

Set Up Claude Code

Wafer exposes an Anthropic-compatible Messages endpoint at https://pass.wafer.ai/v1/messages, so Claude Code can connect directly — no proxy needed. For Claude Code, set ANTHROPIC_BASE_URL to https://pass.wafer.ai, not https://pass.wafer.ai/v1.
1

Install Claude Code

npm install -g @anthropic-ai/claude-code
2

Configure Wafer as the endpoint

Set these environment variables in your shell profile (~/.zshrc, ~/.bashrc, etc.):
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
Or add them to ~/.claude/settings.json for a persistent, per-user config:
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://pass.wafer.ai",
    "ANTHROPIC_API_KEY": "YOUR_WAFER_API_KEY"
  }
}
Replace YOUR_WAFER_API_KEY with your Wafer Pass API key.
Do not share your API key or commit it to version control.
3

Pin a Wafer model

Claude Code sends Anthropic model strings (claude-opus-…, claude-sonnet-…) by default — those don’t match a Wafer model. Pin Claude Code to a Wafer model with these env vars:For Qwen3.5-397B-A17B:
export ANTHROPIC_DEFAULT_OPUS_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_SONNET_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="Qwen3.5-397B-A17B"
export CLAUDE_CODE_SUBAGENT_MODEL="Qwen3.5-397B-A17B"
For GLM-5.1:
export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-5.1"
export CLAUDE_CODE_SUBAGENT_MODEL="GLM-5.1"
Or in ~/.claude/settings.json:
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://pass.wafer.ai",
    "ANTHROPIC_API_KEY": "YOUR_WAFER_API_KEY",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "GLM-5.1",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "GLM-5.1",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "GLM-5.1",
    "CLAUDE_CODE_SUBAGENT_MODEL": "GLM-5.1"
  }
}
These env vars follow the same pattern as OpenRouter’s Claude Code integration. They override the model Claude Code sends for opus, sonnet, haiku, and subagent calls.
4

Start Claude Code

claude
Claude Code now routes requests through the Wafer endpoint.

Set Up Conductor

Conductor runs a team of parallel Claude Code agents in isolated Git worktrees on macOS. Because Conductor uses Claude Code under the hood, it picks up Wafer the same way Claude Code does — pin the model to any Wafer Pass ID from Models.
1

Install Conductor

Download the macOS app from conductor.build and launch it.
2

Configure Wafer as the endpoint

In Conductor’s Settings → Environment (or your shell profile, e.g. ~/.zshrc), set:
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
Replace YOUR_WAFER_API_KEY with your Wafer Pass API key.
Do not share your API key or commit it to version control.
3

Pin a Wafer model

Conductor spawns Claude Code, which sends Anthropic model strings by default — those don’t match a Wafer model. Pin Claude Code (and therefore Conductor) to a Wafer model with these env vars in the same Environment section:For Qwen3.5-397B-A17B:
export ANTHROPIC_DEFAULT_OPUS_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_SONNET_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="Qwen3.5-397B-A17B"
export CLAUDE_CODE_SUBAGENT_MODEL="Qwen3.5-397B-A17B"
For GLM-5.1:
export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-5.1"
export CLAUDE_CODE_SUBAGENT_MODEL="GLM-5.1"
This follows the pattern OpenRouter documents in its Claude Code integration guide. Because Conductor spawns Claude Code subprocesses, the same env vars override opus, sonnet, haiku, and subagent calls.
4

Start a run

Create a workspace in Conductor, pick a repo, and kick off an agent. Requests now route through Wafer.

Set Up Codex

Codex (the OpenAI Codex CLI) only speaks the OpenAI Responses API (/v1/responses). The legacy wire_api = "chat" setting was deprecated in December 2025 and removed in February 2026, so Codex can no longer talk to a Chat Completions endpoint directly. Wafer Pass is OpenAI Chat Completions–compatible (/v1/chat/completions), so to use it from Codex you run a tiny local LiteLLM proxy that translates Responses API requests into Chat Completions on the way to Wafer. This is the path OpenAI’s own Codex deprecation notice points users at.
Once the proxy is running, Codex sends /v1/responses requests to LiteLLM on localhost:4000, LiteLLM rewrites them as /v1/chat/completions against https://pass.wafer.ai/v1, and the response stream is translated back into Responses-API SSE events. You only set this up once.
1

Install Codex

npm install -g @openai/codex
Verify Codex picks up the Responses API (wire_api = "responses" is the only supported value as of 0.92.0+):
codex --version
2

Create a LiteLLM proxy config

Create litellm_config.yaml in a directory of your choice (e.g. ~/.codex/litellm_config.yaml):
model_list:
  - model_name: Qwen3.5-397B-A17B
    litellm_params:
      model: openai/Qwen3.5-397B-A17B
      api_base: https://pass.wafer.ai/v1
      api_key: os.environ/WAFER_API_KEY
  - model_name: GLM-5.1
    litellm_params:
      model: openai/GLM-5.1
      api_base: https://pass.wafer.ai/v1
      api_key: os.environ/WAFER_API_KEY
litellm_settings:
  drop_params: true
The openai/ prefix tells LiteLLM to call Wafer over the OpenAI Chat Completions wire format. drop_params: true lets LiteLLM silently drop Responses-only fields that don’t have a Chat Completions equivalent.
Do not share your API key or commit it to version control. The config above reads it from the WAFER_API_KEY env var.
3

Start the LiteLLM proxy

The local proxy intentionally has no auth configured (it’s only meant to be reached by Codex on the same machine). Always bind it to 127.0.0.1 as shown above. If you need the proxy reachable over a network, set a general_settings.master_key in litellm_config.yaml and require it on every inbound request — see LiteLLM virtual keys.
Confirm it’s translating correctly:
curl -sS http://127.0.0.1:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>" \
  -d '{
    "model": "GLM-5.1",
    "input": "Reply with the single word: ready."
  }'
A 200 response with an output array means the bridge is healthy.
4

Point Codex at the local proxy

Add a Wafer model provider to ~/.codex/config.toml:
model = "GLM-5.1"
model_provider = "wafer"

[model_providers.wafer]
name = "Wafer (via LiteLLM)"
base_url = "http://127.0.0.1:4000/v1"
env_key = "WAFER_API_KEY"
wire_api = "responses"
env_key tells Codex which env var to read for the bearer token it sends to the local proxy. LiteLLM accepts that token (no proxy-side auth is configured here) and uses its own upstream api_key: os.environ/WAFER_API_KEY from litellm_config.yaml to call Wafer. Both ultimately read the same WAFER_API_KEY env var, so the call to Wafer is authenticated with your Wafer Pass key.
Switch models by changing the top-level model = "..." to any model_name from your litellm_config.yaml (e.g. GLM-5.1 or Qwen3.5-397B-A17B).
5

Run Codex

export WAFER_API_KEY=YOUR_WAFER_API_KEY
codex
Codex now sends Responses-API traffic to LiteLLM, which forwards it to Wafer. The footer in the TUI should show the Wafer model id (e.g. GLM-5.1).
Why this isn’t just two env vars (yet): Codex’s wire_api = "chat" removal means every third-party OpenAI-compatible provider that doesn’t natively expose /v1/responses needs a translation layer right now. We track Wafer adding a native Responses endpoint on the Wafer Pass roadmap; when it ships, you’ll be able to drop the LiteLLM step and point Codex straight at https://pass.wafer.ai/v1.

Set Up OpenClaw

Model string: the examples below use GLM-5.1. Swap in any Models ID (Qwen3.5-397B-A17B or GLM-5.1). Applies to every OpenAI-compatible setup section that follows (OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and the generic section at the bottom).
1

Install OpenClaw

curl -fsSL https://openclaw.ai/install.sh | bash
2

Run setup

openclaw setup
3

Add Wafer as a provider

Replace YOUR_WAFER_API_KEY with your Wafer Pass API key:
openclaw config set models.providers.wafer "$(cat <<'EOF'
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "api": "openai-completions",
  "auth": "api-key",
  "apiKey": "YOUR_WAFER_API_KEY",
  "models": [
    { "id": "Qwen3.5-397B-A17B", "name": "Qwen 3.5 397B" },
    { "id": "GLM-5.1", "name": "GLM 5.1" }
  ]
}
EOF
)"
openclaw models set wafer/GLM-5.1
Do not share your API key or commit it to version control.
4

Test it

openclaw agent --local --session-id wafer-test --message "Hello"

Set Up Hermes Agent

1

Install Hermes Agent

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or source ~/.zshrc
2

Point Hermes at Wafer

Replace YOUR_WAFER_API_KEY with your Wafer Pass API key:
hermes config set OPENAI_BASE_URL https://pass.wafer.ai/v1
hermes config set OPENAI_API_KEY YOUR_WAFER_API_KEY
hermes config set model Qwen3.5-397B-A17B
3

Start a session

hermes
Hermes now uses Qwen3.5-397B-A17B through the Wafer endpoint by default.

Set Up Cline

1

Install Cline

Install the Cline extension from the VS Code marketplace, or search “Cline” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open VS Code and click the Cline icon in the sidebar
  2. Click the settings gear icon in the Cline panel
  3. In the API Provider dropdown, select OpenAI Compatible
  4. Fill in these fields:
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model ID: Qwen3.5-397B-A17B
Do not include /chat/completions in the Base URL — Cline appends that automatically.
3

Set model info (recommended)

Expand Model Configuration and set:
  • Context Window Size: 262144
  • Max Output Tokens: 32768
  • Supports Images: unchecked
4

Verify the connection

Send a message in the Cline panel. If Cline responds, you’re connected.

Set Up Roo Code

1

Install Roo Code

Install the Roo Code extension from the VS Code marketplace, or search “Roo Code” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open VS Code and click the Roo Code icon in the sidebar
  2. Click the settings gear icon in the Roo Code panel
  3. In the API Provider dropdown, select OpenAI Compatible
  4. Fill in these fields:
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model ID: Qwen3.5-397B-A17B
3

Set model info (recommended)

Optionally configure:
  • Context Window Size: 262144
  • Max Output Tokens: 32768
4

Start coding

Send a message in the Roo Code panel to confirm the connection.

Set Up Kilo Code

1

Install Kilo Code

Install the Kilo Code extension from the VS Code marketplace, or search “Kilo Code” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open Kilo Code and click the settings gear icon
  2. Go to the Providers tab
  3. Click Custom provider at the bottom
  4. Fill in the dialog:
  • Provider ID: wafer
  • Display Name: Wafer
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model: Qwen3.5-397B-A17B
  1. Click Save
If you’re on an older version of Kilo Code without the Providers tab, select OpenAI Compatible from the API Provider dropdown and enter the same Base URL, API key, and Model ID.
3

Start coding

Send a message in the Kilo Code panel to confirm the connection.

Set Up OpenHands

1

Install OpenHands

Follow the OpenHands installation guide. The quickest way:
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm \
  -p 3000:3000 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker.all-hands.dev/all-hands-ai/openhands:0.44
2

Configure Wafer as the LLM (UI)

  1. Open the OpenHands UI (usually at http://localhost:3000)
  2. Click the settings gear icon
  3. Click Advanced to expand advanced options
  4. Set these fields:
  • Custom Model: openai/GLM-5.1
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
The openai/ prefix is required. OpenHands uses litellm under the hood, and this prefix tells it to use the OpenAI-compatible completion path.
3

Alternative: config.toml

If you prefer file-based config, create or edit config.toml in the project root:
[llm]
model = "openai/GLM-5.1"
api_key = "YOUR_WAFER_API_KEY"
base_url = "https://pass.wafer.ai/v1"
4

Start coding

Open a conversation in the OpenHands UI to confirm the connection.

Set Up LibreChat

1

Install LibreChat

Follow the LibreChat installation guide. The quickest way is Docker:
git clone https://github.com/danny-avila/LibreChat.git
cd LibreChat
cp .env.example .env
docker compose up -d
2

Add Wafer as a provider

Edit librechat.yaml in the project root and add a Wafer endpoint:
endpoints:
  custom:
    - name: "Wafer"
      baseURL: "https://pass.wafer.ai/v1"
      apiKey: "${WAFER_API_KEY}"
      iconURL: "https://avatars.githubusercontent.com/u/213847495?s=200&v=4"
      models:
        default:
          - "Qwen3.5-397B-A17B"
          - "GLM-5.1"
        fetch: false
      titleConvo: true
      modelDisplayLabel: "Wafer"
Set your API key in your .env file:
WAFER_API_KEY=YOUR_WAFER_API_KEY
Replace YOUR_WAFER_API_KEY with your Wafer Pass API key.
Do not share your API key or commit it to version control.
The default list above exposes all Wafer models in LibreChat’s model picker. Drop any IDs you don’t want to surface.
3

Restart and verify

docker compose restart
Open LibreChat in your browser, select Wafer from the endpoint dropdown, and send a message.

Use Wafer with Other Harnesses

Most agent harnesses only need these settings: OpenAI-compatible harnesses (Cline, Roo Code, Kilo Code, OpenClaw, OpenHands, etc.):
  • Base URL: https://pass.wafer.ai/v1
  • Model: any ID from Models, e.g. GLM-5.1 or Qwen3.5-397B-A17B
  • Authentication: your Wafer Pass key (same token in Authorization: Bearer … headers or in the client’s API-key field — keys look like wfr_…)
  • Compatibility mode: OpenAI-compatible / OpenAI API
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "apiKey": "YOUR_WAFER_API_KEY",
  "model": "GLM-5.1"
}
Pick any model string from Models. Anthropic-compatible harnesses (Claude Code, Conductor, or any tool using the Anthropic Messages API):
  • Base URL: https://pass.wafer.ai (the tool appends /v1/messages automatically)
  • Authentication: your Wafer API key via ANTHROPIC_API_KEY
  • Model: pin a Wafer model via ANTHROPIC_DEFAULT_*_MODEL (the harness’s default Anthropic model strings don’t match a Wafer model)
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-5.1"
export CLAUDE_CODE_SUBAGENT_MODEL="GLM-5.1"
If the harness asks for a provider name, you can label it Wafer. Pass your Wafer Pass key wherever the harness expects an Anthropic/API key (ANTHROPIC_API_KEY).

Terms of Use

Wafer Pass is intended for personal agentic coding use only. By purchasing Wafer Pass you agree to the following: Violations of these terms may result in pass revocation without refund. See the full Wafer Terms of Service for details.

FAQ

Wafer Pass includes GLM-5.1 and Qwen3.5-397B-A17B. More models ship on the same subscription as we add them.
Use the Models IDs with https://pass.wafer.ai/v1.
No. Wafer Pass is licensed for personal use by a single individual — one account per person, with a concurrency limit of 3 in-flight requests at any given time. Keep your API key private and use it only for your own workflows. Sharing accounts or API keys, reselling access, or pooling one subscription across multiple people is not permitted, and we reserve the right to suspend or ban accounts we detect operating in violation of these limits. Teams that need shared access should email emilio@wafer.ai about a dedicated or enterprise plan.
Sign up at wafer.ai/pass. Checkout is self-serve and your API key is shown immediately on the success page (and emailed as backup).
For OpenAI-compatible harnesses, use a model string from Models with https://pass.wafer.ai/v1. Model names are case-insensitive for the IDs above. For Claude Code (Anthropic-compatible), pin a Wafer model via ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL / CLAUDE_CODE_SUBAGENT_MODEL.
Codex CLI removed support for the OpenAI Chat Completions wire format in February 2026 (wire_api = "chat" is no longer accepted) and now requires the OpenAI Responses API (/v1/responses). Wafer Pass exposes Chat Completions (/v1/chat/completions), so the local LiteLLM proxy translates between the two. See Set Up Codex for the config. The proxy is only needed for Codex — every other harness on this page (Claude Code, Conductor, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, LibreChat) talks to Wafer directly.
Yes. We’re optimizing the best coding models and adding them to the plan. Price stays the same.