Dedicated endpoints expose OpenAI-compatible inference atDocumentation Index
Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt
Use this file to discover all available pages before exploring further.
https://<ENDPOINT_HOST>/v1. On supported routes such as GLM-5.1, you can send pre-tokenized prompts to /v1/completions and constrain decoding with SGLang/XGrammar-compatible EBNF by passing ebnf.
prompt may be an array of token IDs for a single request. The grammar must be compatible with SGLang/XGrammar EBNF.
Use the model IDs and capabilities configured for your dedicated endpoint. If a model route on your endpoint does not support /v1/completions, use the standard chat completions path instead.