Skip to main content

AMD Profiling

Wafer integrates AMD’s profiling tools to help you analyze and optimize GPU performance on AMD hardware. From ISA analysis to system-wide profiling, these tools provide comprehensive visibility into your ROCm applications.

Available Tools

Choosing a Tool

ToolBest ForGranularity
ISA AnalysisAssembly optimization, register analysisInstruction-level
ROCprof ComputeKernel metrics, roofline analysisPer-kernel
ROCprof SDKCustom profiling, counter collectionFlexible
ROCprof SystemsTimeline analysis, API tracingSystem-wide

Quick Commands

Analyze ISA:
wafer amd isa analyze ./kernel.co
Profile with ROCprofiler Compute:
wafer amd rocprof-compute profile "python train.py"
System-wide profiling:
wafer amd rocprof-systems run "python train.py"
List available counters:
wafer amd rocprof-sdk list-counters

Requirements

ToolRequirement
ISA AnalysisROCm installed, or Wafer server analysis
ROCprof ComputeROCm with rocprofiler-compute
ROCprof SDKROCm with rocprofiler-sdk
ROCprof SystemsROCm with rocprofiler-systems
ISA analysis can run server-side without local AMD hardware. Upload your .co, .s, .ll, or .ttgir files.

Supported Hardware

Wafer supports profiling on:
  • MI300X — AMD Instinct data center GPU
  • MI250X — AMD Instinct with CDNA 2 architecture
  • MI210 — AMD Instinct for HPC
  • MI100 — First-generation CDNA
  • RX 7900 — RDNA 3 consumer GPUs
Check supported targets:
wafer amd isa targets

Typical Workflow

1

System-Level Profile

Start with ROCprofiler Systems to understand overall behavior:
wafer amd rocprof-systems run "python train.py"
2

Identify Hot Kernels

Analyze the trace to find slow kernels:
wafer amd rocprof-systems analyze ./output
3

Kernel Deep-Dive

Profile specific kernels with ROCprofiler Compute:
wafer amd rocprof-compute profile --kernel matmul "python train.py"
4

ISA Analysis

Examine generated assembly:
wafer amd isa analyze ./kernel.co --metrics

AMD vs NVIDIA Tool Mapping

PurposeAMD ToolNVIDIA Tool
Kernel metricsROCprof ComputeNCU
System profilingROCprof SystemsNsys
Assembly analysisISA AnalysisNsight Compute SASS view
Counter collectionROCprof SDKCUPTI

Next Steps