Skip to main content

Wafer Demos

Try Wafer’s features with pre-built demos. These interactive examples let you explore profiling, documentation queries, and kernel evaluation without setting up your own code.

Quick Start

# Set up demo data
wafer demo setup

# List available demo traces
wafer demo traces

# Try a documentation query
wafer demo docs

# Analyze a sample trace
wafer demo trace

# Run kernel evaluation demo
wafer demo eval

Commands

wafer demo setup

Download sample data for demos:
wafer demo setup
This downloads:
  • Sample NCU reports
  • Sample nsys traces
  • Example kernel implementations
  • Test datasets
Output:
Downloading demo data...
  ✓ NCU reports (3 files)
  ✓ Nsys traces (2 files)
  ✓ Kernel examples (5 files)
  ✓ Test data

Demo data ready! Try:
  wafer demo docs    - Query GPU documentation
  wafer demo trace   - Analyze a sample trace
  wafer demo eval    - Run kernel evaluation

wafer demo traces

List available demo traces:
wafer demo traces
Output:
Available Demo Traces:

  NCU Reports:
    matmul-h100.ncu-rep     Matrix multiplication on H100
    attention-a100.ncu-rep  Attention kernel on A100
    conv2d-4090.ncu-rep     Convolution on RTX 4090

  Nsys Traces:
    training-loop.nsys-rep  PyTorch training iteration
    inference.nsys-rep      Model inference pipeline

wafer demo docs

Interactive documentation query demo:
wafer demo docs
This launches an interactive session where you can ask questions about GPU programming:
Welcome to the Wafer Documentation Demo!

Ask questions about CUDA, GPU architecture, or kernel optimization.
Type 'quit' to exit.

You: How do I use shared memory?

Agent: Shared memory in CUDA is declared using __shared__...

wafer demo trace

Analyze a sample performance trace:
wafer demo trace
This analyzes a pre-captured trace and shows:
  • Kernel summary
  • Performance metrics
  • Bottleneck analysis
  • Optimization suggestions
Output:
Analyzing demo trace: matmul-h100.ncu-rep

Kernel: ampere_h16816gemm_256x128_nn
Duration: 0.142ms

Performance Summary:
  Compute Throughput: 78.2%
  Memory Throughput: 45.6%
  Occupancy: 50%

Bottleneck: Memory bandwidth
Suggestion: Consider tiling to improve data reuse...

wafer demo eval

Run a kernel evaluation demo on cloud GPU:
wafer demo eval
This runs an example kernel evaluation workflow:
  1. Syncs demo kernel to cloud GPU
  2. Runs correctness tests
  3. Benchmarks performance
  4. Shows results
Running kernel evaluation demo...

Using: wafer-demo-workspace (H100)

Testing: softmax kernel
  ✓ Correctness: PASSED (all 5 test cases)

Benchmarking:
  Reference: 1.23ms
  Custom: 0.45ms
  Speedup: 2.73x

Demo complete! View results at: ./demo-results/

Demo Data Location

Demo data is stored in:
PlatformLocation
macOS~/.wafer/demo/
Linux~/.wafer/demo/
Windows%USERPROFILE%\.wafer\demo\
Clear demo data:
rm -rf ~/.wafer/demo/

What You’ll Learn

Each demo teaches specific concepts:
DemoConcepts
docsUsing AI for GPU programming questions
traceReading NCU reports, identifying bottlenecks
evalTesting kernel correctness, benchmarking

After the Demo

Ready to work with your own code: