Analyzing Results
Opening a Workload
- Select ROCprofiler Compute from the Wafer tools menu
- Click Browse and select your workload folder
- The folder must contain
sysinfo.csv(generated byrocprof-compute) - Click Launch GUI
GUI Panels
Architecture Diagram
Shows the GPU hardware pipeline with metrics from your run:- Instruction Buffer — Wave occupancy, wave life cycles
- Instruction Dispatch — SALU, SMEM, VALU, MFMA, VMEM, LDS, GWS utilization
- Compute Units — Active CUs, VGPRs, SGPRs, LDS allocation
- Caches — L1 vector, L1 scalar, L1 instruction, L2 hit rates and bandwidth
- Memory — HBM read/write bandwidth, fabric utilization
Roofline Analysis
Plots your kernels on a roofline chart:- X-axis — Arithmetic intensity (FLOPs/Byte)
- Y-axis — Performance (GFLOPS)
- Ceilings — L1, L2, HBM bandwidth limits and peak compute
Kernel Statistics
Two tables:| Table | Contents |
|---|---|
| Top Kernels | Kernel name, dispatch count, total/mean/median duration |
| Dispatch List | Individual dispatches with GPU ID and timing |
System Info
Hardware configuration table:- GPU architecture (gfx942, etc.)
- Compute units, SIMDs, shader engines
- Memory size, bandwidth, clock speeds
- ROCm and driver versions
Filtering Options
| Filter | Description |
|---|---|
| Normalization | Per-wave, per-cycle, per-second, or per-kernel |
| GCD | Filter by Graphics Compute Die (for MI300X) |
| Dispatch Filter | Show specific dispatch IDs |
| Kernels | Filter by kernel name |
| Top N | Limit to N slowest kernels |
Tips
How do I identify memory-bound kernels?
How do I identify memory-bound kernels?
Check the roofline plot. Kernels that fall on or near a memory ceiling (L1, L2, HBM) are memory-bound. Look at the arithmetic intensity—values below 10 FLOPs/Byte typically indicate memory-bound behavior.
What does low wave occupancy mean?
What does low wave occupancy mean?
Low wave occupancy in the architecture diagram means the GPU isn’t fully utilized. Common causes: high register usage (VGPRs), large LDS allocation, or insufficient parallelism in your kernel launch configuration.
Can I compare multiple workloads?
Can I compare multiple workloads?
Currently, the GUI shows one workload at a time. To compare, open multiple VS Code windows with different workloads, or export metrics to JSON for scripting.
Profiling
Learn how to create profiling data →