Skip to main content

Profiling

Create profiling data using wafer-cli on a machine with ROCm installed.

Requirements

  • ROCm 7.0+ with rocprof-compute 3.2+
  • AMD GPU with ROCm support (MI300X, MI250, MI210, MI100)
  • HIP application to profile
Check installation:
wafer rocprof-compute check

Profile an Application

wafer rocprof-compute profile --name my_kernel -- './my_hip_app'
This creates a workloads/my_kernel/ directory with profiling data.

Analyze Results

wafer rocprof-compute analyze workloads/my_kernel

Launch GUI

wafer rocprof-compute analyze workloads/my_kernel --gui
Or open the workload folder in VS Code using the ROCprofiler Compute tool.

Command Options

Profile Options

OptionDescription
--name NAMEWorkload directory name (required)
--no-roofSkip roofline analysis (faster)
--roof-onlyRoofline only, no detailed counters (fastest)
--block SQ,TCC,...Profile specific hardware blocks
--kernel K1,K2,...Filter by kernel name
--dispatch 0,1,2,...Filter by dispatch ID
--path DIROutput directory (default: workloads/)

Analyze Options

OptionDescription
--guiLaunch interactive GUI
--port PORTGUI server port (default: 8050)
--jsonOutput as JSON
--list-statsList all kernels and dispatches

Examples

# Fast profiling without roofline
wafer rocprof-compute profile --name test --no-roof -- './app'

# Profile specific hardware blocks
wafer rocprof-compute profile --name test --block SQ,TCC -- './app'

# Profile specific kernel dispatches
wafer rocprof-compute profile --name test --dispatch 0,1,2 -- './app'

# JSON output for scripting
wafer rocprof-compute analyze workloads/test --json

Output Structure

workloads/my_kernel/
└── MI300X_A1/
    ├── sysinfo.csv           # System/GPU info
    ├── roofline.csv          # Roofline data
    ├── pmc_perf.csv          # Performance counters
    ├── pmc_kernel_top.csv    # Kernel summary
    └── SQ_*.csv, TCC_*.csv   # Per-block metrics

Troubleshooting

Ensure ROCm is installed and rocprof-compute is on your PATH:
export PATH=/opt/rocm/bin:$PATH
Check that your application actually launches GPU kernels. Try running with rocprof-compute directly to see verbose output:
rocprof-compute profile --name test -- './app'
Kernel summaries are generated by the analyze step. Run:
wafer rocprof-compute analyze workloads/my_kernel
The Wafer analyzer falls back to timing data if summaries aren’t available.
Check if port 8050 is in use. Try a different port:
wafer rocprof-compute analyze workloads/my_kernel --gui --port 9000

Overview

← Back to ROCprofiler Compute overview