Agent Setup - Wafer

Features, model availability, rate limits, and pricing may change as we iterate. Questions? Email emilio@wafer.ai.

Set up with: Linzumi · Claude Code · Conductor · Codex · OpenClaw · Hermes Agent · Cline · Roo Code · Kilo Code · OpenHands · LibreChat Wafer builds AI that optimizes AI. We take open models and make them dramatically faster. Wafer Serverless gives you pay-as-you-go access to fast open models — including GLM-5.2, glm5.2-fast, GLM-5.1, Kimi-K2.6, MiniMax-M3, and more — served at multiples of the speed of generic inference providers. Wafer Serverless is built for Linzumi, Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses. Wafer exposes both an OpenAI-compatible endpoint and an Anthropic-compatible Messages endpoint, so tools like Claude Code work out of the box. Linzumi also exposes Wafer directly inside its coding-job flow. Get a Wafer Serverless API key for fast open-source models through a standard API endpoint.

Get started: https://www.wafer.ai/serverless/checkout

Connection Details

Use your Wafer Serverless API key with these values:


OpenAI-compatible endpoint	`https://pass.wafer.ai/v1`
Anthropic-compatible endpoint	`https://pass.wafer.ai/v1/messages`
Send your API key as	`Authorization: Bearer <key>` (`wfr_…` Serverless keys work here)
Request-scoped ZDR	Add `Wafer-ZDR: required` on direct API calls

See Models below for the model strings to pass on the OpenAI-compatible endpoint.

Claude Code uses the Anthropic Messages endpoint. Set ANTHROPIC_BASE_URL=https://pass.wafer.ai and ANTHROPIC_API_KEY to your Wafer key — Claude Code will hit /v1/messages automatically. To make sure Claude Code talks to a Wafer model, set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL to one of the Wafer model IDs in Models (e.g. GLM-5.1). Linzumi is built in: select Wafer in Linzumi’s Start coding job screen instead of pasting an endpoint or API key. All other harnesses (OpenClaw, Cline, Roo Code, etc.) use the OpenAI-compatible endpoint at https://pass.wafer.ai/v1.

For raw curl requests, request fields, and direct API usage, see API Reference.

What’s Included

With a Wafer Serverless API key you get:

Pay-as-you-go access to Wafer-hosted models, billed per token
Access through a standard OpenAI-compatible API and an Anthropic-compatible Messages API using your Wafer API key
Works with Linzumi, Claude Code, Codex, Conductor, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses
Reasoning-capable models support explicit thinking on/off controls in Chat Completions
New fast models as we release them

Models

`model` string	Family	Max context tokens	ZDR support	Reasoning	Notes
`GLM-5.2`	Z.AI flagship	`1048576`	Yes	Yes	1M-context general model
`glm5.2-fast`	Z.AI fast tier	`1048576`	Yes	Yes	Low-latency GLM-5.2 tier
`GLM-5.1`	Z.AI flagship	`202752`	Yes	Yes
`Kimi-K2.6`	Moonshot Kimi K2.6	`262144`	No	Yes	Serverless-only
`MiniMax-M3`	MiniMax M3	`1048576`	No	Yes	Vision-capable; long-context pricing can apply
`Qwen3.5-397B-A17B`	Qwen3.5, 397B MoE	`262144`	No	See models API	Serverless-only

The Max context tokens value is the hard cap enforced by the backend — requests where prompt tokens exceed this value return a 400. If your harness has a Context Window Size setting (Cline, Roo Code, Droid, etc.), set it to the exact integer above. Leave ~2–4k of headroom for the model’s response when filling context. Pass any model string above to any OpenAI-compatible harness configured against https://pass.wafer.ai/v1. Model names are case-insensitive — GLM-5.1 and glm-5.1 both work. If you pass a model name that doesn’t match any available model, the API returns a 404 with the list of available models. Call GET https://pass.wafer.ai/v1/models for the live catalog, capabilities, and pricing before hard-coding model-specific behavior. For Claude Code and other Anthropic-compatible harnesses, set ANTHROPIC_DEFAULT_*_MODEL to one of the model IDs above. See Set Up Claude Code.

API Capabilities

Wafer Serverless supports request-scoped privacy enforcement and advanced completion controls:

Getting Started

Pick a plan

Go to wafer.ai/serverless/checkout to create a Serverless account. Checkout is self-serve and instant.

Get your API key

Your API key is shown in the app after signup. You can also mint a key from https://app.wafer.ai without a card. You can create multiple named keys per account — see API Keys.

Start coding

Select Wafer directly inside Linzumi, or use the key in Claude Code, Codex, Conductor, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, or any other supported harness.

Set Up Linzumi

Linzumi has a built-in Wafer agent provider for coding jobs. You do not need to paste https://pass.wafer.ai/v1, configure WAFER_API_KEY, or add a custom OpenAI-compatible provider inside Linzumi.

Install Linzumi and connect a computer

Install the macOS app from linzumi.com, sign in to your workspace, and make sure your machine appears under Computers in the sidebar.Linzumi runs coding jobs on a connected computer, so pick the runner that has access to the repo you want the agent to work in.

Start a coding job

Click Start coding job in the workspace sidebar.In the setup screen, select the connected computer and repository folder for the job.

Select Wafer

Under Agent, choose Wafer. The card is labeled Open models.The setup screen shows your available Linzumi Wafer/free-token balance before you send the job.

Choose model and thinking

Linzumi’s Wafer model picker currently includes:

GLM-5.2
glm5.2-fast
GLM-5.1
Kimi K2.7 Code
Kimi K2.6
Qwen 3.5 397B

Use GLM-5.2 for the default coding-agent path. The Thinking picker supports default, Low, Medium, High, and XHigh.

Set permissions and send

Pick approval and filesystem permissions before sending:

Approval: On request or On failure
Permissions: Read only, Workspace write, or Full access

Use Read only for exploration, Workspace write for normal repo edits, and reserve Full access for jobs that truly need to operate outside the selected workspace.

Quick smoke test: start a read-only Linzumi job with Wafer selected and ask, “Please do not edit files. Confirm which agent provider/model this Linzumi job is using, then stop.” If it responds in the Linzumi thread, the Wafer integration is working.

Set Up Claude Code

Wafer exposes an Anthropic-compatible Messages endpoint at https://pass.wafer.ai/v1/messages, so Claude Code can connect directly — no proxy needed. For Claude Code, set ANTHROPIC_BASE_URL to https://pass.wafer.ai, not https://pass.wafer.ai/v1.

Install Claude Code

npm install -g @anthropic-ai/claude-code

Configure Wafer as the endpoint

Set these environment variables in your shell profile (~/.zshrc, ~/.bashrc, etc.):

export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY

Or add them to ~/.claude/settings.json for a persistent, per-user config:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://pass.wafer.ai",
    "ANTHROPIC_API_KEY": "YOUR_WAFER_API_KEY"
  }
}

Replace YOUR_WAFER_API_KEY with your Wafer Serverless API key.

Do not share your API key or commit it to version control.

Pin a Wafer model

Claude Code sends Anthropic model strings (claude-opus-…, claude-sonnet-…) by default — those don’t match a Wafer model. Pin Claude Code to a Wafer model with these env vars:For Qwen3.5-397B-A17B:

export ANTHROPIC_DEFAULT_OPUS_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_SONNET_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="Qwen3.5-397B-A17B"
export CLAUDE_CODE_SUBAGENT_MODEL="Qwen3.5-397B-A17B"

For GLM-5.1:

export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-5.1"
export CLAUDE_CODE_SUBAGENT_MODEL="GLM-5.1"

Or in ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://pass.wafer.ai",
    "ANTHROPIC_API_KEY": "YOUR_WAFER_API_KEY",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "GLM-5.1",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "GLM-5.1",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "GLM-5.1",
    "CLAUDE_CODE_SUBAGENT_MODEL": "GLM-5.1"
  }
}

These env vars follow the same pattern as OpenRouter’s Claude Code integration. They override the model Claude Code sends for opus, sonnet, haiku, and subagent calls.

Start Claude Code

claude

Claude Code now routes requests through the Wafer endpoint.

Set Up Conductor

Conductor runs a team of parallel Claude Code agents in isolated Git worktrees on macOS. Because Conductor uses Claude Code under the hood, it picks up Wafer the same way Claude Code does — pin the model to any Wafer model ID from Models.

Install Conductor

Download the macOS app from conductor.build and launch it.

Configure Wafer as the endpoint

In Conductor’s Settings → Environment (or your shell profile, e.g. ~/.zshrc), set:

export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY

Replace YOUR_WAFER_API_KEY with your Wafer Serverless API key.

Do not share your API key or commit it to version control.

Pin a Wafer model

Conductor spawns Claude Code, which sends Anthropic model strings by default — those don’t match a Wafer model. Pin Claude Code (and therefore Conductor) to a Wafer model with these env vars in the same Environment section:For Qwen3.5-397B-A17B:

export ANTHROPIC_DEFAULT_OPUS_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_SONNET_MODEL="Qwen3.5-397B-A17B"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="Qwen3.5-397B-A17B"
export CLAUDE_CODE_SUBAGENT_MODEL="Qwen3.5-397B-A17B"

For GLM-5.1:

export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-5.1"
export CLAUDE_CODE_SUBAGENT_MODEL="GLM-5.1"

This follows the pattern OpenRouter documents in its Claude Code integration guide. Because Conductor spawns Claude Code subprocesses, the same env vars override opus, sonnet, haiku, and subagent calls.

Start a run

Create a workspace in Conductor, pick a repo, and kick off an agent. Requests now route through Wafer.

Set Up Codex

Codex speaks the OpenAI Responses API (/v1/responses). Wafer Serverless exposes a native Responses-compatible endpoint at https://pass.wafer.ai/v1/responses, so current Codex clients can point directly at Wafer without a local proxy.

Install Codex

npm install -g @openai/codex

Verify Codex picks up the Responses API (wire_api = "responses" is the only supported value as of 0.92.0+):

codex --version

Verify Wafer's Responses endpoint

Export your Wafer Serverless key and make a direct Responses API request:

export WAFER_API_KEY=YOUR_WAFER_API_KEY

curl -sS https://pass.wafer.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAFER_API_KEY" \
  -d '{
    "model": "GLM-5.2",
    "input": "Reply with the single word: ready."
  }'

A 200 response with an output array means the Wafer Responses endpoint is healthy.

Point Codex at Wafer

Add a Wafer model provider to ~/.codex/config.toml:

model = "GLM-5.2"
model_provider = "wafer"

[model_providers.wafer]
name = "Wafer"
base_url = "https://pass.wafer.ai/v1"
env_key = "WAFER_API_KEY"
wire_api = "responses"

env_key tells Codex which env var to read for the bearer token it sends to Wafer.

Switch models by changing the top-level model = "..." to any model ID from Models, such as GLM-5.2, glm5.2-fast, or GLM-5.1.

Run Codex

export WAFER_API_KEY=YOUR_WAFER_API_KEY
codex

Codex now sends Responses API traffic directly to Wafer. The footer in the TUI should show the Wafer model ID, for example GLM-5.2.

Older local proxies or private Wafer deployments that do not expose /v1/responses still need a translation layer such as LiteLLM. The public Serverless endpoint at pass.wafer.ai does not.

Set Up OpenClaw

Model string: the examples below use GLM-5.1. Swap in any Models ID (Qwen3.5-397B-A17B or GLM-5.1). Applies to every OpenAI-compatible setup section that follows (OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and the generic section at the bottom).

Install OpenClaw

macOS / Linux
Windows (PowerShell)

curl -fsSL https://openclaw.ai/install.sh | bash

iwr -useb https://openclaw.ai/install.ps1 | iex

Run setup

openclaw setup

Add Wafer as a provider

Replace YOUR_WAFER_API_KEY with your Wafer Serverless API key:

openclaw config set models.providers.wafer "$(cat <<'EOF'
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "api": "openai-completions",
  "auth": "api-key",
  "apiKey": "YOUR_WAFER_API_KEY",
  "models": [
    { "id": "Qwen3.5-397B-A17B", "name": "Qwen 3.5 397B" },
    { "id": "GLM-5.1", "name": "GLM 5.1" }
  ]
}
EOF
)"
openclaw models set wafer/GLM-5.1

Do not share your API key or commit it to version control.

Test it

openclaw agent --local --session-id wafer-test --message "Hello"

Set Up Hermes Agent

Install Hermes Agent

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or source ~/.zshrc

Point Hermes at Wafer

Replace YOUR_WAFER_API_KEY with your Wafer Serverless API key:

hermes config set model.default Qwen3.5-397B-A17B
hermes config set model.provider custom
hermes config set model.base_url https://pass.wafer.ai/v1
hermes config set model.api_mode chat_completions
hermes config set OPENAI_API_KEY YOUR_WAFER_API_KEY

Start a session

hermes

Hermes now uses Qwen3.5-397B-A17B through the Wafer endpoint by default.

Set Up Cline

Install Cline

Install the Cline extension from the VS Code marketplace, or search “Cline” in VS Code Extensions.

Configure Wafer as a provider

Open VS Code and click the Cline icon in the sidebar
Click the settings gear icon in the Cline panel
In the API Provider dropdown, select OpenAI Compatible
Fill in these fields:

Base URL: https://pass.wafer.ai/v1
API Key: your Wafer API key
Model ID: Qwen3.5-397B-A17B

Do not include /chat/completions in the Base URL — Cline appends that automatically.

Set model info (recommended)

Expand Model Configuration and set:

Context Window Size: 262144
Max Output Tokens: 32768
Supports Images: unchecked

Verify the connection

Send a message in the Cline panel. If Cline responds, you’re connected.

Set Up Roo Code

Install Roo Code

Install the Roo Code extension from the VS Code marketplace, or search “Roo Code” in VS Code Extensions.

Configure Wafer as a provider

Open VS Code and click the Roo Code icon in the sidebar
Click the settings gear icon in the Roo Code panel
In the API Provider dropdown, select OpenAI Compatible
Fill in these fields:

Base URL: https://pass.wafer.ai/v1
API Key: your Wafer API key
Model ID: Qwen3.5-397B-A17B

Set model info (recommended)

Optionally configure:

Context Window Size: 262144
Max Output Tokens: 32768

Start coding

Send a message in the Roo Code panel to confirm the connection.

Set Up Kilo Code

Install Kilo Code

Install the Kilo Code extension from the VS Code marketplace, or search “Kilo Code” in VS Code Extensions.

Configure Wafer as a provider

Open Kilo Code and click the settings gear icon
Go to the Providers tab
Click Custom provider at the bottom
Fill in the dialog:

Provider ID: wafer
Display Name: Wafer
Base URL: https://pass.wafer.ai/v1
API Key: your Wafer API key
Model: Qwen3.5-397B-A17B

Click Save

If you’re on an older version of Kilo Code without the Providers tab, select OpenAI Compatible from the API Provider dropdown and enter the same Base URL, API key, and Model ID.

Start coding

Send a message in the Kilo Code panel to confirm the connection.

Set Up OpenHands

Install OpenHands

Follow the OpenHands installation guide. The quickest way:

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm \
  -p 3000:3000 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker.all-hands.dev/all-hands-ai/openhands:0.44

Configure Wafer as the LLM (UI)

Open the OpenHands UI (usually at http://localhost:3000)
Click the settings gear icon
Click Advanced to expand advanced options
Set these fields:

Custom Model: openai/GLM-5.1
Base URL: https://pass.wafer.ai/v1
API Key: your Wafer API key

The openai/ prefix is required. OpenHands uses litellm under the hood, and this prefix tells it to use the OpenAI-compatible completion path.

Alternative: config.toml

If you prefer file-based config, create or edit config.toml in the project root:

[llm]
model = "openai/GLM-5.1"
api_key = "YOUR_WAFER_API_KEY"
base_url = "https://pass.wafer.ai/v1"

Start coding

Open a conversation in the OpenHands UI to confirm the connection.

Set Up LibreChat

Install LibreChat

Follow the LibreChat installation guide. The quickest way is Docker:

git clone https://github.com/danny-avila/LibreChat.git
cd LibreChat
cp .env.example .env
docker compose up -d

Add Wafer as a provider

Edit librechat.yaml in the project root and add a Wafer endpoint:

endpoints:
  custom:
    - name: "Wafer"
      baseURL: "https://pass.wafer.ai/v1"
      apiKey: "${WAFER_API_KEY}"
      iconURL: "https://avatars.githubusercontent.com/u/213847495?s=200&v=4"
      models:
        default:
          - "Qwen3.5-397B-A17B"
          - "GLM-5.1"
        fetch: false
      titleConvo: true
      modelDisplayLabel: "Wafer"

Set your API key in your .env file:

WAFER_API_KEY=YOUR_WAFER_API_KEY

Replace YOUR_WAFER_API_KEY with your Wafer Serverless API key.

Do not share your API key or commit it to version control.

The default list above exposes all Wafer models in LibreChat’s model picker. Drop any IDs you don’t want to surface.

Restart and verify

docker compose restart

Open LibreChat in your browser, select Wafer from the endpoint dropdown, and send a message.

Use Wafer with Other Harnesses

Most agent harnesses only need these settings: OpenAI-compatible harnesses (Cline, Roo Code, Kilo Code, OpenClaw, OpenHands, etc.):

Base URL: https://pass.wafer.ai/v1
Model: any ID from Models, e.g. GLM-5.1 or Qwen3.5-397B-A17B
Authentication: your Wafer Serverless API key (same token in Authorization: Bearer … headers or in the client’s API-key field — keys look like wfr_…)
Compatibility mode: OpenAI-compatible / OpenAI API

{
  "baseUrl": "https://pass.wafer.ai/v1",
  "apiKey": "YOUR_WAFER_API_KEY",
  "model": "GLM-5.1"
}

Pick any model string from Models. Anthropic-compatible harnesses (Claude Code, Conductor, or any tool using the Anthropic Messages API):

Base URL: https://pass.wafer.ai (the tool appends /v1/messages automatically)
Authentication: your Wafer API key via ANTHROPIC_API_KEY
Model: pin a Wafer model via ANTHROPIC_DEFAULT_*_MODEL (the harness’s default Anthropic model strings don’t match a Wafer model)

export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-5.1"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-5.1"
export CLAUDE_CODE_SUBAGENT_MODEL="GLM-5.1"

If the harness asks for a provider name, you can label it Wafer. Pass your Wafer Serverless API key wherever the harness expects an Anthropic/API key (ANTHROPIC_API_KEY).

FAQ

What models do I get?

See Models for the current public catalog. Call GET https://pass.wafer.ai/v1/models for live availability and per-model pricing.

Can I use Wafer Serverless with any model?

Use the Models IDs with https://pass.wafer.ai/v1.

How do I get access?

Do I need a special model ID?

For OpenAI-compatible harnesses, use a model string from Models with https://pass.wafer.ai/v1. Model names are case-insensitive for the IDs above. For Claude Code (Anthropic-compatible), pin a Wafer model via ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL / CLAUDE_CODE_SUBAGENT_MODEL.

Does Linzumi need a Wafer API key or base URL?

No. Linzumi exposes Wafer directly in the Start coding job flow. Choose Wafer under Agent, pick a model, and send the job. Use https://pass.wafer.ai/v1 and a Wafer API key only for the other OpenAI-compatible harnesses on this page.

Why does Codex need a LiteLLM proxy?

It doesn’t for the public Serverless endpoint. Codex requires the OpenAI Responses API (/v1/responses), and Wafer exposes that endpoint natively at https://pass.wafer.ai/v1/responses. A LiteLLM proxy is only needed for older private deployments or custom gateways that do not expose /v1/responses.

Will more models be added?

Yes. We’re optimizing the best coding models and adding them to the Serverless catalog.

​Connection Details

​What’s Included

​Models

​API Capabilities

​Getting Started

​Set Up Linzumi

​Set Up Claude Code

​Set Up Conductor

​Set Up Codex

​Set Up OpenClaw

​Set Up Hermes Agent

​Set Up Cline

​Set Up Roo Code

​Set Up Kilo Code

​Set Up OpenHands

​Set Up LibreChat

​Use Wafer with Other Harnesses

​FAQ

Connection Details

What’s Included

Models

API Capabilities

Getting Started

Set Up Linzumi

Set Up Claude Code

Set Up Conductor

Set Up Codex

Set Up OpenClaw

Set Up Hermes Agent

Set Up Cline

Set Up Roo Code

Set Up Kilo Code

Set Up OpenHands

Set Up LibreChat

Use Wafer with Other Harnesses

FAQ