Skip to main content

NVIDIA Profiling

Wafer integrates multiple NVIDIA profiling tools to help you analyze and optimize GPU performance. From kernel-level metrics to system-wide traces, these tools provide comprehensive visibility into your application’s behavior.

Available Tools

Choosing a Tool

ToolBest ForGranularity
NCUKernel optimization, memory analysis, occupancyPer-kernel
NsysTimeline analysis, API calls, CPU-GPU syncSystem-wide
TraceLensMulti-run comparison, collective opsAggregate
PerfettoVisual timeline, custom queriesSystem-wide

Quick Commands

Analyze an NCU report:
wafer nvidia ncu analyze ./profile.ncu-rep
Profile with Nsight Systems:
wafer nvidia nsys profile "python train.py"
Generate TraceLens report:
wafer nvidia tracelens report ./trace.nsys-rep
Compare traces:
wafer nvidia tracelens compare ./baseline.nsys-rep ./optimized.nsys-rep

Requirements

ToolRequirement
NCUNVIDIA Nsight Compute installed (ncu on PATH)
NsysNVIDIA Nsight Systems installed (nsys on PATH)
TraceLensPython package (installed with wafer-cli)
PerfettoNo local installation required
For NCU analysis, you can use Wafer’s server-side analysis without local installation. Just upload your .ncu-rep file.

Typical Workflow

1

System-Level Profile

Start with Nsight Systems to understand overall behavior:
wafer nvidia nsys profile "python train.py"
2

Identify Hot Kernels

Use TraceLens to find performance bottlenecks:
wafer nvidia tracelens report ./profile.nsys-rep
3

Kernel Deep-Dive

Profile specific kernels with NCU:
ncu --set full -o profile python train.py
wafer nvidia ncu analyze ./profile.ncu-rep
4

Iterate

Make optimizations and compare:
wafer nvidia tracelens compare ./baseline.nsys-rep ./optimized.nsys-rep

Next Steps