Skip to main content

Creating NCU Profiles

When NVIDIA Nsight Compute is installed on your system, you can run profiling directly from Wafer and view the results immediately.
This feature requires NCU (ncu CLI) to be installed and available on your PATH. Without NCU installed, you can still analyze existing .ncu-rep files.

Requirements

Before you can create profiles, ensure:
  1. NVIDIA Nsight Compute is installed
  2. The ncu command is available in your terminal
  3. You have a compiled CUDA executable to profile

Installing NCU

NCU is typically included with the CUDA Toolkit. After installing CUDA:
# Verify NCU is available
ncu --version

Running a Profile

When NCU is detected, the NCU Profiler shows additional configuration options:
1

Configure the executable

Set the Run Command to your compiled CUDA executable:
./a.out
Or with arguments:
./my_kernel --size 1024
2

Set output options

Configure where the profile report will be saved:
  • Output File: Name for the report (without extension)
  • Output Directory: Where to save (default: .wafer/ncu-tool)
3

Run the profile

Click Profile to execute NCU with your configuration. The tool will:
  1. Run NCU with your executable
  2. Generate an .ncu-rep file
  3. Automatically load the results for analysis

Configuration Options

OptionDescriptionDefault
Run CommandThe executable to profile./a.out
Program ArgsArguments to pass to your program(empty)
Output FileName for the report fileprofile
Output DirDirectory for reports.wafer/ncu-tool
Extra ArgsAdditional NCU flags(empty)

Extra NCU Arguments

You can pass additional flags to NCU via the Extra Args field. Common options:
# Profile specific kernels by name
--kernel-name myKernel

# Collect specific metrics
--set full

# Limit profiling to first N kernel launches
--launch-count 10
See the NCU documentation for all available options.

Generated Command

Wafer shows you the exact ncu command that will be executed. You can:
  • Copy the command to run it manually in a terminal
  • Modify extra args to customize the profiling
Example generated command:
ncu -o .wafer/ncu-tool/profile.ncu-rep ./a.out --size 1024

Viewing Results

After profiling completes:
  1. The report automatically loads in the analysis view
  2. Browse kernel metrics and diagnostics as described in Analyzing Reports
  3. The .ncu-rep file is saved to your configured output directory

Troubleshooting

If NCU is installed but not detected:
  1. Ensure ncu is in your PATH
  2. Restart VS Code after installing NCU
  3. Try running ncu --version in VS Code’s integrated terminal
NCU may require elevated permissions on some systems:
Common causes:
  • The executable path is incorrect
  • The executable crashes before kernel launch
  • Insufficient GPU memory
Try running your executable directly first to verify it works.

Best Practices

Profile Release Builds

Always profile optimized builds. Debug builds have extra overhead that skews results.

Warm Up First

Run your kernel once before profiling to warm up GPU caches and avoid cold-start overhead.

Profile Representative Workloads

Use realistic input sizes and data patterns that match your production use case.

Iterate Incrementally

Profile → Optimize → Profile again. Small changes can have unexpected effects.

Next Steps