Analyzing Results

Opening a Workload

Select ROCprofiler Compute from the Wafer tools menu
Click Browse and select your workload folder
The folder must contain sysinfo.csv (generated by rocprof-compute)
Click Launch GUI

The GUI opens in a new panel with your profiling data loaded.

GUI Panels

Architecture Diagram

Shows the GPU hardware pipeline with metrics from your run:

Instruction Buffer — Wave occupancy, wave life cycles
Instruction Dispatch — SALU, SMEM, VALU, MFMA, VMEM, LDS, GWS utilization
Compute Units — Active CUs, VGPRs, SGPRs, LDS allocation
Caches — L1 vector, L1 scalar, L1 instruction, L2 hit rates and bandwidth
Memory — HBM read/write bandwidth, fabric utilization

Yellow/orange highlighting indicates high utilization or potential bottlenecks.

Roofline Analysis

Plots your kernels on a roofline chart:

X-axis — Arithmetic intensity (FLOPs/Byte)
Y-axis — Performance (GFLOPS)
Ceilings — L1, L2, HBM bandwidth limits and peak compute

Kernels below a ceiling are bounded by that resource. Memory-bound kernels appear on the left; compute-bound kernels appear on the right.

Kernel Statistics

Two tables:

Table	Contents
Top Kernels	Kernel name, dispatch count, total/mean/median duration
Dispatch List	Individual dispatches with GPU ID and timing

Use the Kernels dropdown to filter specific kernels.

System Info

Hardware configuration table:

GPU architecture (gfx942, etc.)
Compute units, SIMDs, shader engines
Memory size, bandwidth, clock speeds
ROCm and driver versions

Filtering Options

Filter	Description
Normalization	Per-wave, per-cycle, per-second, or per-kernel
GCD	Filter by Graphics Compute Die (for MI300X)
Dispatch Filter	Show specific dispatch IDs
Kernels	Filter by kernel name
Top N	Limit to N slowest kernels

Tips

How do I identify memory-bound kernels?

Check the roofline plot. Kernels that fall on or near a memory ceiling (L1, L2, HBM) are memory-bound. Look at the arithmetic intensity—values below 10 FLOPs/Byte typically indicate memory-bound behavior.

What does low wave occupancy mean?

Low wave occupancy in the architecture diagram means the GPU isn’t fully utilized. Common causes: high register usage (VGPRs), large LDS allocation, or insufficient parallelism in your kernel launch configuration.

Can I compare multiple workloads?

Currently, the GUI shows one workload at a time. To compare, open multiple VS Code windows with different workloads, or export metrics to JSON for scripting.

Profiling

Learn how to create profiling data →

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

Analyzing Results

Analyzing Results

Opening a Workload

GUI Panels

Architecture Diagram

Roofline Analysis

Kernel Statistics

System Info

Filtering Options

Tips

Profiling

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

​Analyzing Results

​Opening a Workload

​GUI Panels

​Architecture Diagram

​Roofline Analysis

​Kernel Statistics

​System Info

​Filtering Options

​Tips

Profiling

Analyzing Results

Opening a Workload

GUI Panels

Architecture Diagram

Roofline Analysis

Kernel Statistics

System Info

Filtering Options

Tips