Skip to main content

AMD Profiling

Wafer integrates AMD’s profiling tools to help you analyze and optimize GPU performance on AMD hardware. From ISA analysis to system-wide profiling, these tools provide comprehensive visibility into your ROCm applications.

Available Tools

ISA Analysis

Analyze GPU assembly, extract metrics, and understand instruction-level behavior.

ROCprofiler Compute

Kernel-level profiling with hardware counters, similar to NVIDIA NCU.

ROCprofiler SDK

Low-level profiling API for custom instrumentation.

ROCprofiler Systems

System-wide profiling for application analysis, similar to nsys.

Choosing a Tool

ToolBest ForGranularity
ISA AnalysisAssembly optimization, register analysisInstruction-level
ROCprof ComputeKernel metrics, roofline analysisPer-kernel
ROCprof SDKCustom profiling, counter collectionFlexible
ROCprof SystemsTimeline analysis, API tracingSystem-wide

Quick Commands

Analyze ISA:
wafer amd isa analyze ./kernel.co
Profile with ROCprofiler Compute:
wafer amd rocprof-compute profile "python train.py"
System-wide profiling:
wafer amd rocprof-systems run "python train.py"
List available counters:
wafer amd rocprof-sdk list-counters

Requirements

ToolRequirement
ISA AnalysisROCm installed, or Wafer server analysis
ROCprof ComputeROCm with rocprofiler-compute
ROCprof SDKROCm with rocprofiler-sdk
ROCprof SystemsROCm with rocprofiler-systems
ISA analysis can run server-side without local AMD hardware. Upload your .co, .s, .ll, or .ttgir files.

Supported Hardware

Wafer supports profiling on:
  • MI300X — AMD Instinct data center GPU
  • MI250X — AMD Instinct with CDNA 2 architecture
  • MI210 — AMD Instinct for HPC
  • MI100 — First-generation CDNA
  • RX 7900 — RDNA 3 consumer GPUs
Check supported targets:
wafer amd isa targets

Typical Workflow

1

System-Level Profile

Start with ROCprofiler Systems to understand overall behavior:
wafer amd rocprof-systems run "python train.py"
2

Identify Hot Kernels

Analyze the trace to find slow kernels:
wafer amd rocprof-systems analyze ./output
3

Kernel Deep-Dive

Profile specific kernels with ROCprofiler Compute:
wafer amd rocprof-compute profile --kernel matmul "python train.py"
4

ISA Analysis

Examine generated assembly:
wafer amd isa analyze ./kernel.co --metrics

AMD vs NVIDIA Tool Mapping

PurposeAMD ToolNVIDIA Tool
Kernel metricsROCprof ComputeNCU
System profilingROCprof SystemsNsys
Assembly analysisISA AnalysisNsight Compute SASS view
Counter collectionROCprof SDKCUPTI

Next Steps

ISA Analysis

Analyze GPU assembly.

ROCprofiler Compute

Kernel-level profiling.

ROCprofiler SDK

Custom profiling.

Cross-Platform Compare

Compare AMD and NVIDIA.