Skip to main content

ROCprofiler SDK

ROCprofiler SDK provides low-level profiling capabilities for AMD GPUs. Use it to collect hardware counters, trace GPU activity, and build custom profiling solutions.

Quick Start

# Check installation
wafer amd rocprof-sdk check

# List available counters
wafer amd rocprof-sdk list-counters

# Profile an application
wafer amd rocprof-sdk profile "python train.py"

# Analyze results
wafer amd rocprof-sdk analyze ./output

Commands

wafer amd rocprof-sdk check

Verify ROCprofiler SDK installation:
wafer amd rocprof-sdk check
Output:
ROCprofiler SDK: installed
Version: 2.0.0
ROCm: 6.0.0
Path: /opt/rocm/lib/librocprofiler-sdk.so

wafer amd rocprof-sdk list-counters

List available hardware counters:
wafer amd rocprof-sdk list-counters [OPTIONS]
Options:
OptionDescription
--gpuGPU target (default: detected)
--categoryFilter by category
--searchSearch counter names
Example:
wafer amd rocprof-sdk list-counters --category memory
Output:
Memory Counters (gfx942):

  FETCH_SIZE
    Total bytes fetched from video memory
    Unit: bytes

  WRITE_SIZE
    Total bytes written to video memory
    Unit: bytes

  L2_CACHE_HIT
    L2 cache hit count
    Unit: count

  L2_CACHE_MISS
    L2 cache miss count
    Unit: count

  ...

wafer amd rocprof-sdk profile

Profile an application with counter collection:
wafer amd rocprof-sdk profile [OPTIONS] "<command>"
Options:
OptionDescription
--output, -oOutput directory (default: ./rocprof-output)
--countersComma-separated counter names
--kernelFilter to specific kernel
--traceEnable activity tracing
Examples:
# Profile with default counters
wafer amd rocprof-sdk profile "python train.py"

# Profile specific counters
wafer amd rocprof-sdk profile \
  --counters FETCH_SIZE,WRITE_SIZE,GRBM_COUNT \
  "python train.py"

# Profile specific kernel
wafer amd rocprof-sdk profile \
  --kernel "matmul_kernel" \
  "python train.py"

# Profile with activity trace
wafer amd rocprof-sdk profile --trace "python train.py"

wafer amd rocprof-sdk analyze

Analyze profiling results:
wafer amd rocprof-sdk analyze [OPTIONS] <output-dir>
Options:
OptionDescription
--formatOutput format: text, json, csv
--sortSort by: time, calls, name
Example:
wafer amd rocprof-sdk analyze ./rocprof-output
Output:
Profiling Results
=================

Kernel Summary:
  Name                        Calls    Total Time    Avg Time
  matmul_kernel               1000     5.23s         5.23ms
  elementwise_add             2000     1.12s         0.56ms
  reduce_sum                  500      0.89s         1.78ms

Counter Data (matmul_kernel):
  FETCH_SIZE:     2.15 GB total, 2.15 MB/call
  WRITE_SIZE:     1.07 GB total, 1.07 MB/call
  L2_CACHE_HIT:   89.3%
  VALU_UTIL:      78.2%
  VMEM_UTIL:      45.6%

Counter Categories

CategoryExamplesUse Case
ComputeVALU_UTIL, SALU_UTILALU utilization
MemoryFETCH_SIZE, WRITE_SIZEMemory traffic
CacheL2_CACHE_HIT, L1_HITCache efficiency
OccupancyWAVE_OCCUPANCYResource usage
InstructionINST_COUNT, BRANCHInstruction mix

Common Counter Sets

Memory Analysis

wafer amd rocprof-sdk profile \
  --counters FETCH_SIZE,WRITE_SIZE,L2_CACHE_HIT,L2_CACHE_MISS \
  "python train.py"

Compute Analysis

wafer amd rocprof-sdk profile \
  --counters VALU_UTIL,SALU_UTIL,VALU_ACTIVE_CYCLES,VMEM_ACTIVE_CYCLES \
  "python train.py"

Occupancy Analysis

wafer amd rocprof-sdk profile \
  --counters WAVE_OCCUPANCY,GRBM_COUNT,SQ_WAVES \
  "python train.py"

Multi-Pass Profiling

Some counter combinations require multiple profiling passes:
# Wafer handles multi-pass automatically
wafer amd rocprof-sdk profile \
  --counters FETCH_SIZE,VALU_UTIL,L2_CACHE_HIT \
  "python train.py"
Multi-pass profiling runs your application multiple times. Ensure your application produces consistent behavior for accurate results.

Troubleshooting

Ensure ROCm is installed and environment is set:
export PATH=/opt/rocm/bin:$PATH
export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
Not all counters are available on all GPUs. Check availability:
wafer amd rocprof-sdk list-counters --gpu gfx942
GPU profiling may require elevated privileges:
sudo wafer amd rocprof-sdk profile "python train.py"

Next Steps