Baseline Discovery
Thewafer baseline command traces PyTorch operations to discover which GPU kernels are dispatched. This helps you understand the performance baseline before writing custom kernels.
Quick Start
Commands
wafer baseline run
Execute an operation and trace the kernel dispatch:| Option | Short | Description |
|---|---|---|
--shape | -s | Tensor shape: name=dim1,dim2,... (repeatable) |
--dtype | -d | Data type for tensors (default: float16) |
--hardware | Hardware name for roofline analysis | |
--target | -t | GPU target for execution |
--workspace | -w | Workspace name |
--warmup | Warmup iterations (default: 10) | |
--runs | Profiling runs (default: 100) | |
--no-cache | Skip cache, always run fresh | |
--json | Output as JSON | |
--verbose | -v | Show verbose output |
--timeout | Timeout in seconds (default: 120) |
wafer baseline hardware
List supported hardware for roofline analysis:Output
Baseline discovery provides:Use Cases
Understanding PyTorch Dispatch
See what kernel runs for your operation:Establishing Performance Baseline
Before optimizing, know the current performance:Comparing Data Types
Remote Execution
Run on remote GPUs:Caching
Results are cached by default to speed up repeated queries:Tensor Variable Names
Use any valid Python identifier for tensor names:Next Steps
Roofline Analysis
Deeper performance analysis.
Kernel Evaluation
Test your custom kernel.
AI Agent
Get help optimizing.
Profiling
Profile the dispatched kernel.