Agent Templates
Templates provide pre-configured prompts and tool configurations for specific tasks. They help you get started quickly with common workflows like documentation lookup, trace analysis, and kernel optimization.Using Templates
Run a template with the-t flag:
Available Templates
ask-docs
Query GPU documentation to answer technical questions.- Searches through GPU architecture and programming documentation
- Supports CUDA, CuTLASS, HIP, and AMD documentation corpora
- Multi-turn mode for follow-up questions
--corpus: Specify documentation source (cuda,cutlass,hip,amd)
trace-analyze
Analyze GPU performance traces and identify optimization opportunities.- NVIDIA Nsight Compute (
.ncu-rep) - NVIDIA Nsight Systems (
.nsys-rep) - PyTorch traces (
.json) - Perfetto traces (
.perfetto,.pftrace)
- Automatic format detection
- Built-in SQL queries for trace analysis
- Identifies bottlenecks and optimization opportunities
- Single-turn mode for one-shot analysis
trace: Path to the trace file
optimize-kernel
Iteratively optimize GPU kernel implementations for maximum performance.- Full file editing capabilities (read, write, edit)
- Extended thinking for complex optimization reasoning
- Multi-turn mode for iterative optimization cycles
- Focuses on memory access, occupancy, register usage, warp divergence
kernel: Path to the kernel source filetarget: Target GPU (default: H100)
optimize-kernelbench
Optimize KernelBench problems with custom GPU kernels.- Specialized for KernelBench problem format
- Uses
wafer evaluate kernelbenchfor correctness verification - Access to profiling tools (NCU, nsys, rocprof)
- Multi-turn mode for iterative optimization
reference: Path to the reference implementationpool: Target pool for evaluation
Template Variables
Pass variables to templates using--args KEY=VALUE:
Comparison
| Template | Purpose | Mode | Thinking |
|---|---|---|---|
ask-docs | Documentation queries | Multi-turn | Disabled |
trace-analyze | Performance trace analysis | Single-turn | Disabled |
optimize-kernel | Kernel optimization | Multi-turn | Enabled |
optimize-kernelbench | KernelBench challenges | Multi-turn | Optional |