Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt

Use this file to discover all available pages before exploring further.

Dedicated endpoints expose OpenAI-compatible inference at https://<ENDPOINT_HOST>/v1. On supported routes such as GLM-5.1, you can send pre-tokenized prompts to /v1/completions and constrain decoding with SGLang/XGrammar-compatible EBNF by passing ebnf.
curl -sS "https://<ENDPOINT_HOST>/v1/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1",
    "prompt": [9703],
    "max_tokens": 2,
    "temperature": 0,
    "ebnf": "root ::= \"A\" | \"B\""
  }'
prompt may be an array of token IDs for a single request. The grammar must be compatible with SGLang/XGrammar EBNF. Use the model IDs and capabilities configured for your dedicated endpoint. If a model route on your endpoint does not support /v1/completions, use the standard chat completions path instead.