Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt

Use this file to discover all available pages before exploring further.

GLM-5.1 supports pre-tokenized prompts on the OpenAI-compatible /v1/completions endpoint. You can also constrain decoding with SGLang/XGrammar-compatible EBNF by passing ebnf.
curl -sS "https://pass.wafer.ai/v1/completions" \
  -H "Authorization: Bearer <YOUR_WAFER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1",
    "prompt": [9703],
    "max_tokens": 2,
    "temperature": 0,
    "ebnf": "root ::= \"A\" | \"B\""
  }'
prompt may be an array of token IDs for a single request. The grammar must be compatible with SGLang/XGrammar EBNF.