Skip to main content

Analyzing Results

Opening a Workload

  1. Select ROCprofiler Compute from the Wafer tools menu
  2. Click Browse and select your workload folder
  3. The folder must contain sysinfo.csv (generated by rocprof-compute)
  4. Click Launch GUI
The GUI opens in a new panel with your profiling data loaded.

GUI Panels

Architecture Diagram

Shows the GPU hardware pipeline with metrics from your run:
  • Instruction Buffer — Wave occupancy, wave life cycles
  • Instruction Dispatch — SALU, SMEM, VALU, MFMA, VMEM, LDS, GWS utilization
  • Compute Units — Active CUs, VGPRs, SGPRs, LDS allocation
  • Caches — L1 vector, L1 scalar, L1 instruction, L2 hit rates and bandwidth
  • Memory — HBM read/write bandwidth, fabric utilization
Yellow/orange highlighting indicates high utilization or potential bottlenecks.

Roofline Analysis

Plots your kernels on a roofline chart:
  • X-axis — Arithmetic intensity (FLOPs/Byte)
  • Y-axis — Performance (GFLOPS)
  • Ceilings — L1, L2, HBM bandwidth limits and peak compute
Kernels below a ceiling are bounded by that resource. Memory-bound kernels appear on the left; compute-bound kernels appear on the right.

Kernel Statistics

Two tables:
TableContents
Top KernelsKernel name, dispatch count, total/mean/median duration
Dispatch ListIndividual dispatches with GPU ID and timing
Use the Kernels dropdown to filter specific kernels.

System Info

Hardware configuration table:
  • GPU architecture (gfx942, etc.)
  • Compute units, SIMDs, shader engines
  • Memory size, bandwidth, clock speeds
  • ROCm and driver versions

Filtering Options

FilterDescription
NormalizationPer-wave, per-cycle, per-second, or per-kernel
GCDFilter by Graphics Compute Die (for MI300X)
Dispatch FilterShow specific dispatch IDs
KernelsFilter by kernel name
Top NLimit to N slowest kernels

Tips

Check the roofline plot. Kernels that fall on or near a memory ceiling (L1, L2, HBM) are memory-bound. Look at the arithmetic intensity—values below 10 FLOPs/Byte typically indicate memory-bound behavior.
Low wave occupancy in the architecture diagram means the GPU isn’t fully utilized. Common causes: high register usage (VGPRs), large LDS allocation, or insufficient parallelism in your kernel launch configuration.
Currently, the GUI shows one workload at a time. To compare, open multiple VS Code windows with different workloads, or export metrics to JSON for scripting.

Profiling

Learn how to create profiling data →