ROCprofiler SDK
ROCprofiler SDK provides low-level profiling capabilities for AMD GPUs. Use it to collect hardware counters, trace GPU activity, and build custom profiling solutions.
Quick Start
# Check installation
wafer amd rocprof-sdk check
# List available counters
wafer amd rocprof-sdk list-counters
# Profile an application
wafer amd rocprof-sdk profile "python train.py"
# Analyze results
wafer amd rocprof-sdk analyze ./output
Commands
wafer amd rocprof-sdk check
Verify ROCprofiler SDK installation:
wafer amd rocprof-sdk check
Output:
ROCprofiler SDK: installed
Version: 2.0.0
ROCm: 6.0.0
Path: /opt/rocm/lib/librocprofiler-sdk.so
wafer amd rocprof-sdk list-counters
List available hardware counters:
wafer amd rocprof-sdk list-counters [OPTIONS]
Options:
Option Description --gpuGPU target (default: detected) --categoryFilter by category --searchSearch counter names
Example:
wafer amd rocprof-sdk list-counters --category memory
Output:
Memory Counters (gfx942):
FETCH_SIZE
Total bytes fetched from video memory
Unit: bytes
WRITE_SIZE
Total bytes written to video memory
Unit: bytes
L2_CACHE_HIT
L2 cache hit count
Unit: count
L2_CACHE_MISS
L2 cache miss count
Unit: count
...
wafer amd rocprof-sdk profile
Profile an application with counter collection:
wafer amd rocprof-sdk profile [OPTIONS] "<command>"
Options:
Option Description --output, -oOutput directory (default: ./rocprof-output) --countersComma-separated counter names --kernelFilter to specific kernel --traceEnable activity tracing
Examples:
# Profile with default counters
wafer amd rocprof-sdk profile "python train.py"
# Profile specific counters
wafer amd rocprof-sdk profile \
--counters FETCH_SIZE,WRITE_SIZE,GRBM_COUNT \
"python train.py"
# Profile specific kernel
wafer amd rocprof-sdk profile \
--kernel "matmul_kernel" \
"python train.py"
# Profile with activity trace
wafer amd rocprof-sdk profile --trace "python train.py"
wafer amd rocprof-sdk analyze
Analyze profiling results:
wafer amd rocprof-sdk analyze [OPTIONS] < output-dir >
Options:
Option Description --formatOutput format: text, json, csv --sortSort by: time, calls, name
Example:
wafer amd rocprof-sdk analyze ./rocprof-output
Output:
Profiling Results
=================
Kernel Summary:
Name Calls Total Time Avg Time
matmul_kernel 1000 5.23s 5.23ms
elementwise_add 2000 1.12s 0.56ms
reduce_sum 500 0.89s 1.78ms
Counter Data (matmul_kernel):
FETCH_SIZE: 2.15 GB total, 2.15 MB/call
WRITE_SIZE: 1.07 GB total, 1.07 MB/call
L2_CACHE_HIT: 89.3%
VALU_UTIL: 78.2%
VMEM_UTIL: 45.6%
Counter Categories
Category Examples Use Case Compute VALU_UTIL, SALU_UTIL ALU utilization Memory FETCH_SIZE, WRITE_SIZE Memory traffic Cache L2_CACHE_HIT, L1_HIT Cache efficiency Occupancy WAVE_OCCUPANCY Resource usage Instruction INST_COUNT, BRANCH Instruction mix
Common Counter Sets
Memory Analysis
wafer amd rocprof-sdk profile \
--counters FETCH_SIZE,WRITE_SIZE,L2_CACHE_HIT,L2_CACHE_MISS \
"python train.py"
Compute Analysis
wafer amd rocprof-sdk profile \
--counters VALU_UTIL,SALU_UTIL,VALU_ACTIVE_CYCLES,VMEM_ACTIVE_CYCLES \
"python train.py"
Occupancy Analysis
wafer amd rocprof-sdk profile \
--counters WAVE_OCCUPANCY,GRBM_COUNT,SQ_WAVES \
"python train.py"
Multi-Pass Profiling
Some counter combinations require multiple profiling passes:
# Wafer handles multi-pass automatically
wafer amd rocprof-sdk profile \
--counters FETCH_SIZE,VALU_UTIL,L2_CACHE_HIT \
"python train.py"
Multi-pass profiling runs your application multiple times. Ensure your application produces consistent behavior for accurate results.
Troubleshooting
ROCprofiler SDK not found
Ensure ROCm is installed and environment is set: export PATH = / opt / rocm / bin : $PATH
export LD_LIBRARY_PATH = / opt / rocm / lib : $LD_LIBRARY_PATH
Not all counters are available on all GPUs. Check availability: wafer amd rocprof-sdk list-counters --gpu gfx942
GPU profiling may require elevated privileges: sudo wafer amd rocprof-sdk profile "python train.py"
Next Steps