Get per-account Serverless SLA metrics for traffic served throughDocumentation Index
Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt
Use this file to discover all available pages before exploring further.
pass.wafer.ai. The response includes summary latency, TTFT, tokens-per-second, token counts, error counts, and a time series over the requested lookback window.
Serverless metrics are scoped to the account that owns the Serverless API key. They do not expose fleet-wide backend identities, live backend gauges, or other customers’ traffic.
Run
Parameters
endpoint(required):pass.wafer.airange_minutes(optional):5,15,30,60,360,1440,10080,43200model(optional): filter to one resolved model, such asQwen/Qwen3-Coder-480B-A35B-Instruct-FP8- default
range_minutes:60
Response Shape
Key Fields
rps: average requests per second across the full windowttft_p50_ms,ttft_p90_ms,ttft_p99_ms: streaming time-to-first-token percentilestps_p10,tps_p50,tps_p90,tps_p99: output tokens-per-second percentileslatency_p50_ms,latency_p90_ms,latency_p99_ms: end-to-end latency percentilescache_hit_pct: cache-read prompt tokens divided by total input tokenscount_2xx,count_4xx,count_5xx: request count by status classerror_rate_pct:4xx + 5xxdivided by total requestsconcurrent_requests: recent activity count for the scoped accounttimeseries[*]: per-bucket requests, throughput, latency, TTFT, and error count
Serverless Scoping
- Only Serverless API keys can query
endpoint=pass.wafer.ai. - Results are scoped to the owning Serverless account for the bearer key.
modelfilters the resolved model that served the request.backendsis empty for Serverless because backend identities are fleet-level data.- Engine gauge fields and
active_accountsarenullfor Serverless because they are fleet-level metrics.
Errors
401: missing or invalid API key403: the API key is not a Serverless key forpass.wafer.ai422: invalidrange_minutes,endpoint, ormodel502: upstream metrics query failed