ISA Analysis

Analyze AMD GPU assembly code to understand instruction-level behavior, register usage, and performance characteristics. Supports compiled objects (.co), assembly (.s), LLVM IR (.ll), and Triton TTGIR (.ttgir).

Quick Start

# Analyze a compiled object
wafer amd isa analyze ./kernel.co

# Extract metrics
wafer amd isa metrics ./kernel.co

# List supported targets
wafer amd isa targets

Commands

wafer amd isa analyze

Analyze ISA and show detailed breakdown:

wafer amd isa analyze [OPTIONS] <file>

Options:

Option	Description
`--target`	GPU target (e.g., gfx942, gfx90a)
`--kernel`	Specific kernel name to analyze
`--format`	Output format: `text`, `json`
`--verbose`	Show detailed instruction breakdown

Example:

wafer amd isa analyze ./matmul.co --verbose

Output:

Kernel: matmul_kernel
Target: gfx942 (MI300X)

Instruction Summary:
  Total instructions: 1,234
  VALU: 456 (37%)
  VMEM: 234 (19%)
  SALU: 123 (10%)
  SMEM: 89 (7%)
  Branch: 45 (4%)
  Other: 287 (23%)

Register Usage:
  VGPRs: 64
  SGPRs: 32
  LDS: 16384 bytes

Occupancy:
  Waves per SIMD: 4
  Theoretical: 8
  Limiter: VGPRs

Potential Issues:
  - High VMEM pressure (19% of instructions)
  - Consider register spilling to increase occupancy

wafer amd isa metrics

Extract numerical metrics for comparison:

wafer amd isa metrics [OPTIONS] <file>

Options:

Option	Description
`--target`	GPU target
`--json`	Output as JSON

Example:

wafer amd isa metrics ./kernel.co --json

Output:

{
  "kernel": "matmul_kernel",
  "target": "gfx942",
  "instructions": 1234,
  "vgprs": 64,
  "sgprs": 32,
  "lds_bytes": 16384,
  "occupancy": 0.5,
  "valu_count": 456,
  "vmem_count": 234
}

wafer amd isa targets

List supported GPU targets:

wafer amd isa targets

Output:

Supported Targets:
  gfx942  - MI300X, MI300A (CDNA 3)
  gfx90a  - MI250X, MI250 (CDNA 2)
  gfx908  - MI100 (CDNA 1)
  gfx1100 - RX 7900 (RDNA 3)
  gfx1030 - RX 6900 (RDNA 2)

Supported File Types

Extension	Description
`.co`	Compiled code object (hsaco)
`.s`	GCN/CDNA assembly source
`.ll`	LLVM IR
`.ttgir`	Triton GPU IR

Understanding the Output

Instruction Categories

VALU: Vector ALU operations (compute)
VMEM: Vector memory operations (global memory)
SALU: Scalar ALU operations (control flow, addresses)
SMEM: Scalar memory operations (constants, metadata)
LDS: Local data share operations (shared memory)
Branch: Control flow instructions

Register Usage

VGPRs: Vector general-purpose registers (per-thread data)
SGPRs: Scalar general-purpose registers (shared data)
LDS: Local data share memory (shared memory in CUDA terms)

Occupancy

Occupancy is limited by:

VGPR usage: More registers = fewer waves
LDS usage: More shared memory = fewer waves
Wave count: Maximum waves per workgroup

Optimization Tips

Reduce VGPR usage

Recompute values instead of storing
Use smaller data types (fp16, bf16)
Restructure code to reduce live variables

Improve memory access

Coalesce global memory accesses
Use LDS for reused data
Prefetch with s_waitcnt optimization

Increase instruction-level parallelism

Interleave independent operations
Reduce dependency chains
Use vector operations where possible

Server-Side Analysis

Analyze ISA without local AMD hardware:

# Upload and analyze server-side
wafer amd isa analyze ./kernel.co --server

This is useful when:

Developing on non-AMD machines
Cross-compiling for different targets
Comparing across GPU generations

Next Steps

ROCprofiler Compute

Runtime kernel profiling.

ROCprofiler SDK

Custom counter collection.

AMD Overview

All AMD profiling tools.

Cross-Platform

Compare with NVIDIA ISA.

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

ISA Analysis

ISA Analysis

Quick Start

Commands

wafer amd isa analyze

wafer amd isa metrics

wafer amd isa targets

Supported File Types

Understanding the Output

Instruction Categories

Register Usage

Occupancy

Optimization Tips

Server-Side Analysis

Next Steps

ROCprofiler Compute

ROCprofiler SDK

AMD Overview

Cross-Platform

Getting Started

CLI

AI Agent

Kernel Development

NVIDIA Profiling

NCU Profiler

Perfetto

AMD Profiling

ROCprofiler Compute

Infrastructure

Compare

Onboarding

More

​ISA Analysis

​Quick Start

​Commands

​wafer amd isa analyze

​wafer amd isa metrics

​wafer amd isa targets

​Supported File Types

​Understanding the Output

​Instruction Categories

​Register Usage

​Occupancy

​Optimization Tips

​Server-Side Analysis

​Next Steps

ROCprofiler Compute

ROCprofiler SDK

AMD Overview

Cross-Platform

ISA Analysis

Quick Start

Commands

wafer amd isa analyze

wafer amd isa metrics

wafer amd isa targets

Supported File Types

Understanding the Output

Instruction Categories

Register Usage

Occupancy

Optimization Tips

Server-Side Analysis

Next Steps