Get endpoint-level summary metrics plus a time series over a lookback window.Documentation Index
Fetch the complete documentation index at: https://docs.wafer.ai/llms.txt
Use this file to discover all available pages before exploring further.
Run
Parameters
endpoint(required):<ENDPOINT_HOST>range_minutes(optional):5,15,30,60,360,1440- default
range_minutes:60
Response Shape
Key Fields
rps: average requests per second across the full windowttft_p50_ms,ttft_p90_ms,ttft_p99_ms: streaming TTFT percentilestps_p50,tps_p90: output tokens-per-second percentileslatency_p50_ms,latency_p90_ms,latency_p99_ms: end-to-end latency percentilescache_hit_pct: cache-read prompt tokens divided by total input tokenscount_2xx,count_4xx,count_5xx: request count by status classerror_rate_pct:4xx + 5xxdivided by total requestsconcurrent_requests: recent activity count, not an exact inflight gaugeactive_accounts: distinct API keys active in the last 2 minutesengine_running_requests: sum of in-flight requests across all backend enginesengine_queue_depth: sum of queued requests across all backend enginesengine_kv_cache_hit_rate_pct: average KV cache hit rate across backends (percentage)engine_kv_cache_usage_pct: average KV cache utilization across backends (percentage)engine_preemptions_in_range: total engine preemptions during the lookback windowbackends[*].active_requests: live wafer-edge backend inflight gauge when emitted by the endpointbackends[*].engine_running_requests: live engine running request gauge for that backend/nodebackends[*].engine_queue_depth: live engine queue depth gauge for that backend/nodetimeseries[*]: per-bucketrequests,tps_p50,ttft_p50_ms,latency_p50_ms,error_count
Errors
401: missing or invalid API key403: API key does not have access to the requested endpoint422: invalidrange_minutesor invalid query shape502: upstream metrics query failed
Notes
- Metrics are endpoint-scoped. You must pass an
endpointyour key can access. - For per-request debugging, use Request Inspection.