Skip to main content

ISA Analysis

Analyze AMD GPU assembly code to understand instruction-level behavior, register usage, and performance characteristics. Supports compiled objects (.co), assembly (.s), LLVM IR (.ll), and Triton TTGIR (.ttgir).

Quick Start

# Analyze a compiled object
wafer amd isa analyze ./kernel.co

# Extract metrics
wafer amd isa metrics ./kernel.co

# List supported targets
wafer amd isa targets

Commands

wafer amd isa analyze

Analyze ISA and show detailed breakdown:
wafer amd isa analyze [OPTIONS] <file>
Options:
OptionDescription
--targetGPU target (e.g., gfx942, gfx90a)
--kernelSpecific kernel name to analyze
--formatOutput format: text, json
--verboseShow detailed instruction breakdown
Example:
wafer amd isa analyze ./matmul.co --verbose
Output:
Kernel: matmul_kernel
Target: gfx942 (MI300X)

Instruction Summary:
  Total instructions: 1,234
  VALU: 456 (37%)
  VMEM: 234 (19%)
  SALU: 123 (10%)
  SMEM: 89 (7%)
  Branch: 45 (4%)
  Other: 287 (23%)

Register Usage:
  VGPRs: 64
  SGPRs: 32
  LDS: 16384 bytes

Occupancy:
  Waves per SIMD: 4
  Theoretical: 8
  Limiter: VGPRs

Potential Issues:
  - High VMEM pressure (19% of instructions)
  - Consider register spilling to increase occupancy

wafer amd isa metrics

Extract numerical metrics for comparison:
wafer amd isa metrics [OPTIONS] <file>
Options:
OptionDescription
--targetGPU target
--jsonOutput as JSON
Example:
wafer amd isa metrics ./kernel.co --json
Output:
{
  "kernel": "matmul_kernel",
  "target": "gfx942",
  "instructions": 1234,
  "vgprs": 64,
  "sgprs": 32,
  "lds_bytes": 16384,
  "occupancy": 0.5,
  "valu_count": 456,
  "vmem_count": 234
}

wafer amd isa targets

List supported GPU targets:
wafer amd isa targets
Output:
Supported Targets:
  gfx942  - MI300X, MI300A (CDNA 3)
  gfx90a  - MI250X, MI250 (CDNA 2)
  gfx908  - MI100 (CDNA 1)
  gfx1100 - RX 7900 (RDNA 3)
  gfx1030 - RX 6900 (RDNA 2)

Supported File Types

ExtensionDescription
.coCompiled code object (hsaco)
.sGCN/CDNA assembly source
.llLLVM IR
.ttgirTriton GPU IR

Understanding the Output

Instruction Categories

  • VALU: Vector ALU operations (compute)
  • VMEM: Vector memory operations (global memory)
  • SALU: Scalar ALU operations (control flow, addresses)
  • SMEM: Scalar memory operations (constants, metadata)
  • LDS: Local data share operations (shared memory)
  • Branch: Control flow instructions

Register Usage

  • VGPRs: Vector general-purpose registers (per-thread data)
  • SGPRs: Scalar general-purpose registers (shared data)
  • LDS: Local data share memory (shared memory in CUDA terms)

Occupancy

Occupancy is limited by:
  1. VGPR usage: More registers = fewer waves
  2. LDS usage: More shared memory = fewer waves
  3. Wave count: Maximum waves per workgroup

Optimization Tips

  • Recompute values instead of storing
  • Use smaller data types (fp16, bf16)
  • Restructure code to reduce live variables
  • Coalesce global memory accesses
  • Use LDS for reused data
  • Prefetch with s_waitcnt optimization
  • Interleave independent operations
  • Reduce dependency chains
  • Use vector operations where possible

Server-Side Analysis

Analyze ISA without local AMD hardware:
# Upload and analyze server-side
wafer amd isa analyze ./kernel.co --server
This is useful when:
  • Developing on non-AMD machines
  • Cross-compiling for different targets
  • Comparing across GPU generations

Next Steps