Agent Templates

Templates provide pre-configured prompts and tool configurations for specific tasks. They help you get started quickly with common workflows like documentation lookup, trace analysis, and kernel optimization.

Using Templates

Run a template with the -t flag:

wafer agent -t <template-name> [--args KEY=VALUE] "Your prompt"

Available Templates

ask-docs

Query GPU documentation to answer technical questions.

# General GPU question
wafer agent -t ask-docs "How do bank conflicts occur in shared memory?"

# With specific corpus
wafer agent -t ask-docs --corpus cuda "Explain warp-level primitives"
wafer agent -t ask-docs --corpus cutlass "How do I configure a GEMM tile size?"

Features:

Searches through GPU architecture and programming documentation
Supports CUDA, CuTLASS, HIP, and AMD documentation corpora
Multi-turn mode for follow-up questions

Options:

--corpus: Specify documentation source (cuda, cutlass, hip, amd)

trace-analyze

Analyze GPU performance traces and identify optimization opportunities.

# Analyze an NCU report
wafer agent -t trace-analyze --args trace=./profile.ncu-rep "What's the bottleneck?"

# Analyze an Nsys trace
wafer agent -t trace-analyze --args trace=./app.nsys-rep "Where is time being spent?"

# Analyze a Perfetto trace
wafer agent -t trace-analyze --args trace=./trace.perfetto "Find the slowest kernels"

Supported Formats:

NVIDIA Nsight Compute (.ncu-rep)
NVIDIA Nsight Systems (.nsys-rep)
PyTorch traces (.json)
Perfetto traces (.perfetto, .pftrace)

Features:

Automatic format detection
Built-in SQL queries for trace analysis
Identifies bottlenecks and optimization opportunities
Single-turn mode for one-shot analysis

Required Arguments:

trace: Path to the trace file

optimize-kernel

Iteratively optimize GPU kernel implementations for maximum performance.

# Optimize a CUDA kernel
wafer agent -t optimize-kernel --args kernel=./matmul.cu "Optimize for H100"

# Specify target GPU
wafer agent -t optimize-kernel --args kernel=./conv.cu --args target=A100 "Reduce memory bandwidth"

Features:

Full file editing capabilities (read, write, edit)
Extended thinking for complex optimization reasoning
Multi-turn mode for iterative optimization cycles
Focuses on memory access, occupancy, register usage, warp divergence

Arguments:

kernel: Path to the kernel source file
target: Target GPU (default: H100)

optimize-kernelbench

Optimize KernelBench problems with custom GPU kernels.

# Optimize a KernelBench problem
wafer agent -t optimize-kernelbench \
  --args reference=./23_Softmax.py \
  --args pool=kernelbench-pool \
  "Optimize the Softmax kernel"

Features:

Specialized for KernelBench problem format
Uses wafer evaluate kernelbench for correctness verification
Access to profiling tools (NCU, nsys, rocprof)
Multi-turn mode for iterative optimization

Required Arguments:

reference: Path to the reference implementation
pool: Target pool for evaluation

Template Variables

Pass variables to templates using --args KEY=VALUE:

wafer agent -t optimize-kernel \
  --args kernel=./kernel.cu \
  --args target=H100 \
  "Your prompt"

Multiple arguments can be specified:

wafer agent -t trace-analyze \
  --args trace=./profile.ncu-rep \
  --args focus=memory \
  "Analyze memory bottlenecks"

Comparison

Template	Purpose	Mode	Thinking
`ask-docs`	Documentation queries	Multi-turn	Disabled
`trace-analyze`	Performance trace analysis	Single-turn	Disabled
`optimize-kernel`	Kernel optimization	Multi-turn	Enabled
`optimize-kernelbench`	KernelBench challenges	Multi-turn	Optional

Next Steps

Agent Overview

Learn more about the AI agent.

Sessions

Resume and manage sessions.

Kernel Evaluation

Verify kernel correctness.

Profiling Tools

Learn about profiling.

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

Agent Templates

Agent Templates

Using Templates

Available Templates

ask-docs

trace-analyze

optimize-kernel

optimize-kernelbench

Template Variables

Comparison

Next Steps

Agent Overview

Sessions

Kernel Evaluation

Profiling Tools

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

​Agent Templates

​Using Templates

​Available Templates

​ask-docs

​trace-analyze

​optimize-kernel

​optimize-kernelbench

​Template Variables

​Comparison

​Next Steps

Agent Overview

Sessions

Kernel Evaluation

Profiling Tools

Agent Templates

Using Templates

Available Templates

ask-docs

trace-analyze

optimize-kernel

optimize-kernelbench

Template Variables

Comparison

Next Steps