NVIDIA Profiling
Wafer integrates multiple NVIDIA profiling tools to help you analyze and optimize GPU performance. From kernel-level metrics to system-wide traces, these tools provide comprehensive visibility into your application’s behavior.Available Tools
NCU Profiler
NVIDIA Nsight Compute for detailed kernel metrics, roofline analysis, and source-level profiling.
Nsight Systems
System-wide profiling for CPU-GPU interaction, kernel launches, and memory transfers.
TraceLens
Performance reports, trace comparison, and collective communication analysis.
Perfetto
Chrome-style trace visualization with SQL query support.
Choosing a Tool
| Tool | Best For | Granularity |
|---|---|---|
| NCU | Kernel optimization, memory analysis, occupancy | Per-kernel |
| Nsys | Timeline analysis, API calls, CPU-GPU sync | System-wide |
| TraceLens | Multi-run comparison, collective ops | Aggregate |
| Perfetto | Visual timeline, custom queries | System-wide |
Quick Commands
Analyze an NCU report:Requirements
| Tool | Requirement |
|---|---|
| NCU | NVIDIA Nsight Compute installed (ncu on PATH) |
| Nsys | NVIDIA Nsight Systems installed (nsys on PATH) |
| TraceLens | Python package (installed with wafer-cli) |
| Perfetto | No local installation required |
For NCU analysis, you can use Wafer’s server-side analysis without local installation. Just upload your
.ncu-rep file.