Baseline Discovery
Thewafer baseline command traces PyTorch operations to discover which GPU kernels are dispatched. This helps you understand the performance baseline before writing custom kernels.
Quick Start
Commands
wafer baseline run
Execute an operation and trace the kernel dispatch:| Option | Short | Description |
|---|---|---|
--shape | -s | Tensor shape: name=dim1,dim2,... (repeatable) |
--dtype | -d | Data type for tensors (default: float16) |
--hardware | Hardware name for roofline analysis | |
--target | -t | GPU target for execution |
--workspace | -w | Workspace name |
--warmup | Warmup iterations (default: 10) | |
--runs | Profiling runs (default: 100) | |
--no-cache | Skip cache, always run fresh | |
--json | Output as JSON | |
--verbose | -v | Show verbose output |
--timeout | Timeout in seconds (default: 120) |