Skip to main content

Agent Templates

Templates provide pre-configured prompts and tool configurations for specific tasks. They help you get started quickly with common workflows like documentation lookup, trace analysis, and kernel optimization.

Using Templates

Run a template with the -t flag:
wafer agent -t <template-name> [--args KEY=VALUE] "Your prompt"

Available Templates

ask-docs

Query GPU documentation to answer technical questions.
# General GPU question
wafer agent -t ask-docs "How do bank conflicts occur in shared memory?"

# With specific corpus
wafer agent -t ask-docs --corpus cuda "Explain warp-level primitives"
wafer agent -t ask-docs --corpus cutlass "How do I configure a GEMM tile size?"
Features:
  • Searches through GPU architecture and programming documentation
  • Supports CUDA, CuTLASS, HIP, and AMD documentation corpora
  • Multi-turn mode for follow-up questions
Options:
  • --corpus: Specify documentation source (cuda, cutlass, hip, amd)

trace-analyze

Analyze GPU performance traces and identify optimization opportunities.
# Analyze an NCU report
wafer agent -t trace-analyze --args trace=./profile.ncu-rep "What's the bottleneck?"

# Analyze an Nsys trace
wafer agent -t trace-analyze --args trace=./app.nsys-rep "Where is time being spent?"

# Analyze a Perfetto trace
wafer agent -t trace-analyze --args trace=./trace.perfetto "Find the slowest kernels"
Supported Formats:
  • NVIDIA Nsight Compute (.ncu-rep)
  • NVIDIA Nsight Systems (.nsys-rep)
  • PyTorch traces (.json)
  • Perfetto traces (.perfetto, .pftrace)
Features:
  • Automatic format detection
  • Built-in SQL queries for trace analysis
  • Identifies bottlenecks and optimization opportunities
  • Single-turn mode for one-shot analysis
Required Arguments:
  • trace: Path to the trace file

optimize-kernel

Iteratively optimize GPU kernel implementations for maximum performance.
# Optimize a CUDA kernel
wafer agent -t optimize-kernel --args kernel=./matmul.cu "Optimize for H100"

# Specify target GPU
wafer agent -t optimize-kernel --args kernel=./conv.cu --args target=A100 "Reduce memory bandwidth"
Features:
  • Full file editing capabilities (read, write, edit)
  • Extended thinking for complex optimization reasoning
  • Multi-turn mode for iterative optimization cycles
  • Focuses on memory access, occupancy, register usage, warp divergence
Arguments:
  • kernel: Path to the kernel source file
  • target: Target GPU (default: H100)

optimize-kernelbench

Optimize KernelBench problems with custom GPU kernels.
# Optimize a KernelBench problem
wafer agent -t optimize-kernelbench \
  --args reference=./23_Softmax.py \
  --args pool=kernelbench-pool \
  "Optimize the Softmax kernel"
Features:
  • Specialized for KernelBench problem format
  • Uses wafer evaluate kernelbench for correctness verification
  • Access to profiling tools (NCU, nsys, rocprof)
  • Multi-turn mode for iterative optimization
Required Arguments:
  • reference: Path to the reference implementation
  • pool: Target pool for evaluation

Template Variables

Pass variables to templates using --args KEY=VALUE:
wafer agent -t optimize-kernel \
  --args kernel=./kernel.cu \
  --args target=H100 \
  "Your prompt"
Multiple arguments can be specified:
wafer agent -t trace-analyze \
  --args trace=./profile.ncu-rep \
  --args focus=memory \
  "Analyze memory bottlenecks"

Comparison

TemplatePurposeModeThinking
ask-docsDocumentation queriesMulti-turnDisabled
trace-analyzePerformance trace analysisSingle-turnDisabled
optimize-kernelKernel optimizationMulti-turnEnabled
optimize-kernelbenchKernelBench challengesMulti-turnOptional

Next Steps