Skip to main content
Set up with:
Wafer Qwen3.5-397B-A17B vs base SGLang throughput Wafer builds AI that optimizes AI. We take open models and make them dramatically faster. Our first Wafer Pass release is Qwen3.5-397B-A17B, an optimized Qwen3.5 397B model running at 1.5-4x the speed of generic inference providers. More fast open models are coming, and they’ll join the same plan as they launch. Wafer Pass is built for Claude Code, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses. Wafer exposes both an OpenAI-compatible endpoint and an Anthropic-compatible Messages endpoint, so tools like Claude Code work out of the box. Get a Wafer Pass for fast open source models (starting with Qwen3.5-397B-A17B) through a standard API endpoint. Plans start at $40/month.
Get your Wafer Pass: https://www.wafer.ai/pass
Wafer Pass is in early access. We’re onboarding developers in small batches. Features, availability, and pricing are subject to change.

Connection Details

Use the credentials from your Wafer access email with these values:
OpenAI-compatible endpointhttps://pass.wafer.ai/v1
Anthropic-compatible endpointhttps://pass.wafer.ai/v1/messages
ModelQwen3.5-397B-A17B
AuthenticationAPI key
Concurrency1 inflight request per user
Claude Code uses the Anthropic Messages endpoint. Set ANTHROPIC_BASE_URL=https://pass.wafer.ai and ANTHROPIC_API_KEY to your Wafer key — Claude Code will hit /v1/messages automatically and Wafer routes all requests to the fastest available model regardless of what model name Claude Code sends. All other harnesses (OpenClaw, Cline, Roo Code, etc.) use the OpenAI-compatible endpoint at https://pass.wafer.ai/v1.

What’s Included

With an active Wafer Pass subscription you get:
  • Qwen3.5-397B-A17B requests with zero per-token costs (optimized by Wafer Inference Engine, 128K context window)
  • Access through a standard OpenAI-compatible API and an Anthropic-compatible Messages API using your Wafer API key
  • Works with Claude Code, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses
  • 1 concurrent request today, with higher inflight limits coming soon
  • New fast models as we release them — same subscription, no price increase

Pricing

PlanPriceRequests / 5hr windowOverage (input)Overage (output)
Starter$40/mo1,000$0.60/M tokens$4.00/M tokens
Pro$100/mo5,000$0.40/M tokens$2.60/M tokens
Max$250/mo20,000$0.30/M tokens$2.00/M tokens

Getting Started

1

Apply for access

Go to wafer.ai/pass and pick your plan. We’re onboarding in small batches and will notify you when your spot opens.
2

Receive your access email

Once you’re approved, we’ll send you your Wafer endpoint, model ID, and API key.
3

Start coding

Use those credentials in Claude Code, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, or any other supported harness.

Set Up Claude Code

Wafer exposes an Anthropic-compatible Messages endpoint at https://pass.wafer.ai/v1/messages, so Claude Code can connect directly — no proxy needed.
1

Install Claude Code

npm install -g @anthropic-ai/claude-code
2

Configure Wafer as the endpoint

Set these environment variables in your shell profile (~/.zshrc, ~/.bashrc, etc.):
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
Or add them to ~/.claude/settings.json for a persistent, per-user config:
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://pass.wafer.ai",
    "ANTHROPIC_API_KEY": "YOUR_WAFER_API_KEY"
  }
}
Replace YOUR_WAFER_API_KEY with the key from your Wafer access email.
Do not share your API key or commit it to version control.
3

Start Claude Code

claude
Claude Code now routes requests through the Wafer endpoint. You don’t need to configure a model — Wafer automatically routes all requests to the fastest available model (currently Qwen3.5-397B-A17B) regardless of the model name Claude Code sends.

Set Up OpenClaw

1

Install OpenClaw

curl -fsSL https://openclaw.ai/install.sh | bash
2

Run setup

openclaw setup
3

Add Wafer as a provider

Replace YOUR_WAFER_API_KEY with the key from your Wafer access email:
openclaw config set models.providers.wafer --batch-file /dev/stdin <<'EOF'
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "api": "openai-completions",
  "auth": "api-key",
  "apiKey": "YOUR_WAFER_API_KEY",
  "models": [{ "id": "Qwen3.5-397B-A17B", "name": "Qwen 3.5 397B" }]
}
EOF
openclaw models set wafer/Qwen3.5-397B-A17B
Do not share your API key or commit it to version control.
4

Test it

openclaw agent --local --session-id wafer-test --message "Hello"

Set Up Hermes Agent

1

Install Hermes Agent

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or source ~/.zshrc
2

Point Hermes at Wafer

Replace YOUR_WAFER_API_KEY with the key from your Wafer access email:
hermes config set OPENAI_BASE_URL https://pass.wafer.ai/v1
hermes config set OPENAI_API_KEY YOUR_WAFER_API_KEY
hermes config set model Qwen3.5-397B-A17B
3

Start a session

hermes
Hermes now uses Qwen3.5-397B-A17B through the Wafer endpoint by default.

Set Up Cline

1

Install Cline

Install the Cline extension from the VS Code marketplace, or search “Cline” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open VS Code and click the Cline icon in the sidebar
  2. Click the settings gear icon in the Cline panel
  3. In the API Provider dropdown, select OpenAI Compatible
  4. Fill in these fields:
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model ID: Qwen3.5-397B-A17B
Do not include /chat/completions in the Base URL — Cline appends that automatically.
3

Set model info (recommended)

Expand Model Configuration and set:
  • Context Window Size: 131072
  • Max Output Tokens: 32768
  • Supports Images: unchecked
4

Verify the connection

Send a message in the Cline panel. If Cline responds, you’re connected.

Set Up Roo Code

1

Install Roo Code

Install the Roo Code extension from the VS Code marketplace, or search “Roo Code” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open VS Code and click the Roo Code icon in the sidebar
  2. Click the settings gear icon in the Roo Code panel
  3. In the API Provider dropdown, select OpenAI Compatible
  4. Fill in these fields:
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model ID: Qwen3.5-397B-A17B
3

Set model info (recommended)

Optionally configure:
  • Context Window Size: 131072
  • Max Output Tokens: 32768
4

Start coding

Send a message in the Roo Code panel to confirm the connection.

Set Up Kilo Code

1

Install Kilo Code

Install the Kilo Code extension from the VS Code marketplace, or search “Kilo Code” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open Kilo Code and click the settings gear icon
  2. Go to the Providers tab
  3. Click Custom provider at the bottom
  4. Fill in the dialog:
  • Provider ID: wafer
  • Display Name: Wafer
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model: Qwen3.5-397B-A17B
  1. Click Save
If you’re on an older version of Kilo Code without the Providers tab, select OpenAI Compatible from the API Provider dropdown and enter the same Base URL, API key, and Model ID.
3

Start coding

Send a message in the Kilo Code panel to confirm the connection.

Set Up OpenHands

1

Install OpenHands

Follow the OpenHands installation guide. The quickest way:
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm \
  -p 3000:3000 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker.all-hands.dev/all-hands-ai/openhands:0.44
2

Configure Wafer as the LLM (UI)

  1. Open the OpenHands UI (usually at http://localhost:3000)
  2. Click the settings gear icon
  3. Click Advanced to expand advanced options
  4. Set these fields:
  • Custom Model: openai/Qwen3.5-397B-A17B
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
The openai/ prefix is required. OpenHands uses litellm under the hood, and this prefix tells it to use the OpenAI-compatible completion path.
3

Alternative: config.toml

If you prefer file-based config, create or edit config.toml in the project root:
[llm]
model = "openai/Qwen3.5-397B-A17B"
api_key = "YOUR_WAFER_API_KEY"
base_url = "https://pass.wafer.ai/v1"
4

Start coding

Open a conversation in the OpenHands UI to confirm the connection.

Use Wafer with Other Harnesses

Most agent harnesses only need these settings: OpenAI-compatible harnesses (Cline, Roo Code, Kilo Code, OpenClaw, OpenHands, etc.):
  • Base URL: https://pass.wafer.ai/v1
  • Model: Qwen3.5-397B-A17B
  • Authentication: your Wafer API key
  • Compatibility mode: OpenAI-compatible / OpenAI API
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "apiKey": "YOUR_WAFER_API_KEY",
  "model": "Qwen3.5-397B-A17B"
}
Anthropic-compatible harnesses (Claude Code, or any tool using the Anthropic Messages API):
  • Base URL: https://pass.wafer.ai (the tool appends /v1/messages automatically)
  • Authentication: your Wafer API key via ANTHROPIC_API_KEY
  • Model: no override needed — Wafer routes all requests to the fastest available model
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
If the harness asks for a provider name, you can label it Wafer. If it asks whether your key is a bearer token or an API key, use the same Wafer key from your access email.

FAQ

Today Wafer Pass gives you access to Qwen3.5-397B-A17B through the Wafer endpoint. More fast models are coming.
Wafer Pass currently covers Qwen3.5-397B-A17B only. We’re adding more models soon — same subscription, no price increase.
No. Keep your API key private and use it only for your own workflows.
Apply at wafer.ai/unlimited. We’re onboarding in small batches.
Yes. Use Qwen3.5-397B-A17B with the https://pass.wafer.ai/v1 endpoint.
Today each user gets 1 concurrent request. We expect to raise that limit over time.
Yes. We’re optimizing the best coding models and adding them to the plan. Price stays the same.