NVIDIA Profiling

Wafer integrates multiple NVIDIA profiling tools to help you analyze and optimize GPU performance. From kernel-level metrics to system-wide traces, these tools provide comprehensive visibility into your application’s behavior.

Available Tools

NCU Profiler

NVIDIA Nsight Compute for detailed kernel metrics, roofline analysis, and source-level profiling.

Nsight Systems

System-wide profiling for CPU-GPU interaction, kernel launches, and memory transfers.

TraceLens

Performance reports, trace comparison, and collective communication analysis.

Perfetto

Chrome-style trace visualization with SQL query support.

Choosing a Tool

Tool	Best For	Granularity
NCU	Kernel optimization, memory analysis, occupancy	Per-kernel
Nsys	Timeline analysis, API calls, CPU-GPU sync	System-wide
TraceLens	Multi-run comparison, collective ops	Aggregate
Perfetto	Visual timeline, custom queries	System-wide

Quick Commands

Analyze an NCU report:

wafer nvidia ncu analyze ./profile.ncu-rep

Profile with Nsight Systems:

wafer nvidia nsys profile "python train.py"

Generate TraceLens report:

wafer nvidia tracelens report ./trace.nsys-rep

Compare traces:

wafer nvidia tracelens compare ./baseline.nsys-rep ./optimized.nsys-rep

Requirements

Tool	Requirement
NCU	NVIDIA Nsight Compute installed (`ncu` on PATH)
Nsys	NVIDIA Nsight Systems installed (`nsys` on PATH)
TraceLens	Python package (installed with wafer-cli)
Perfetto	No local installation required

For NCU analysis, you can use Wafer’s server-side analysis without local installation. Just upload your .ncu-rep file.

Typical Workflow

System-Level Profile

Start with Nsight Systems to understand overall behavior:

wafer nvidia nsys profile "python train.py"

Identify Hot Kernels

Use TraceLens to find performance bottlenecks:

wafer nvidia tracelens report ./profile.nsys-rep

Kernel Deep-Dive

Profile specific kernels with NCU:

ncu --set full -o profile python train.py
wafer nvidia ncu analyze ./profile.ncu-rep

Iterate

Make optimizations and compare:

wafer nvidia tracelens compare ./baseline.nsys-rep ./optimized.nsys-rep

Next Steps

NCU Profiler

Deep kernel-level profiling.

Nsight Systems

System-wide profiling.

TraceLens

Trace comparison and reports.

Perfetto

Visual trace analysis.

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

NVIDIA Profiling Overview

NVIDIA Profiling

Available Tools

NCU Profiler

Nsight Systems

TraceLens

Perfetto

Choosing a Tool

Quick Commands

Requirements

Typical Workflow

Next Steps

NCU Profiler

Nsight Systems

TraceLens

Perfetto

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

​NVIDIA Profiling

​Available Tools

NCU Profiler

Nsight Systems

TraceLens

Perfetto

​Choosing a Tool

​Quick Commands

​Requirements

​Typical Workflow

​Next Steps

NCU Profiler

Nsight Systems

TraceLens

Perfetto

NVIDIA Profiling

Available Tools

Choosing a Tool

Quick Commands

Requirements

Typical Workflow

Next Steps