Serverless Metrics API

Get per-account Serverless SLA metrics for traffic served through pass.wafer.ai. The response includes summary latency, TTFT, tokens-per-second, token counts, error counts, and a time series over the requested lookback window. Serverless metrics are scoped to the account that owns the Serverless API key. They do not expose fleet-wide backend identities, live backend gauges, or other customers’ traffic.

Run

curl -s "https://api.wafer.ai/v1/endpoints/metrics?endpoint=pass.wafer.ai&range_minutes=60&model=<MODEL_ID>" \
  -H "Authorization: Bearer <SERVERLESS_API_KEY>"

Parameters

endpoint (required): pass.wafer.ai
range_minutes (optional): 5, 15, 30, 60, 360, 1440, 10080, 43200
model (optional): filter to one resolved model, such as Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
default range_minutes: 60

Response Shape

{
  "endpoint": "pass.wafer.ai",
  "range_minutes": 60,
  "queried_at": "2026-05-22T19:55:19+00:00",
  "summary": {
    "total_requests": 42,
    "rps": 0.01,
    "ttft_p50_ms": 410.0,
    "ttft_p90_ms": 1250.0,
    "ttft_p99_ms": 3200.0,
    "tps_p10": 52.0,
    "tps_p50": 92.0,
    "tps_p90": 155.0,
    "tps_p99": 180.0,
    "latency_p50_ms": 2100.0,
    "latency_p90_ms": 6400.0,
    "latency_p99_ms": 12000.0,
    "cache_hit_pct": 72.5,
    "total_input_tokens": 480000,
    "total_output_tokens": 42000,
    "total_cache_read_tokens": 348000,
    "count_2xx": 41,
    "count_4xx": 1,
    "count_5xx": 0,
    "error_rate_pct": 2.4,
    "concurrent_requests": 2,
    "active_accounts": null,
    "engine_running_requests": null,
    "engine_queue_depth": null,
    "engine_kv_cache_hit_rate_pct": null,
    "engine_kv_cache_usage_pct": null,
    "engine_preemptions_in_range": null
  },
  "backends": [],
  "timeseries": [
    {
      "time": "2026-05-22T18:55:00Z",
      "requests": 12,
      "tps_p10": 50.0,
      "tps_p50": 92.0,
      "tps_p90": 150.0,
      "tps_p99": 175.0,
      "ttft_p50_ms": 410.0,
      "ttft_p90_ms": 1200.0,
      "ttft_p99_ms": 3000.0,
      "latency_p50_ms": 2100.0,
      "latency_p90_ms": 6400.0,
      "latency_p99_ms": 12000.0,
      "error_count": 1
    }
  ]
}

Key Fields

rps: average requests per second across the full window
ttft_p50_ms, ttft_p90_ms, ttft_p99_ms: streaming time-to-first-token percentiles
tps_p10, tps_p50, tps_p90, tps_p99: output tokens-per-second percentiles
latency_p50_ms, latency_p90_ms, latency_p99_ms: end-to-end latency percentiles
cache_hit_pct: cache-read prompt tokens divided by total input tokens
count_2xx, count_4xx, count_5xx: request count by status class
error_rate_pct: 4xx + 5xx divided by total requests
concurrent_requests: recent activity count for the scoped account
timeseries[*]: per-bucket requests, throughput, latency, TTFT, and error count

Serverless Scoping

Only Serverless API keys can query endpoint=pass.wafer.ai.
Results are scoped to the owning Serverless account for the bearer key.
model filters the resolved model that served the request.
backends is empty for Serverless because backend identities are fleet-level data.
Engine gauge fields and active_accounts are null for Serverless because they are fleet-level metrics.

Errors

401: missing or invalid API key
403: the API key is not a Serverless key for pass.wafer.ai
422: invalid range_minutes, endpoint, or model
502: upstream metrics query failed

Wafer Pass

Serverless

Dedicated Endpoints

Reference

Serverless Metrics API

Run

Parameters

Response Shape

Key Fields

Serverless Scoping

Errors

Wafer Pass

Serverless

Dedicated Endpoints

Reference

Documentation Index

​Run

​Parameters

​Response Shape

​Key Fields

​Serverless Scoping

​Errors

Run

Parameters

Response Shape

Key Fields

Serverless Scoping

Errors