ISA Analysis
Analyze AMD GPU assembly code to understand instruction-level behavior, register usage, and performance characteristics. Supports compiled objects (.co), assembly (.s), LLVM IR (.ll), and Triton TTGIR (.ttgir).
Quick Start
Commands
wafer amd isa analyze
Analyze ISA and show detailed breakdown:| Option | Description |
|---|---|
--target | GPU target (e.g., gfx942, gfx90a) |
--kernel | Specific kernel name to analyze |
--format | Output format: text, json |
--verbose | Show detailed instruction breakdown |
wafer amd isa metrics
Extract numerical metrics for comparison:| Option | Description |
|---|---|
--target | GPU target |
--json | Output as JSON |
wafer amd isa targets
List supported GPU targets:Supported File Types
| Extension | Description |
|---|---|
.co | Compiled code object (hsaco) |
.s | GCN/CDNA assembly source |
.ll | LLVM IR |
.ttgir | Triton GPU IR |
Understanding the Output
Instruction Categories
- VALU: Vector ALU operations (compute)
- VMEM: Vector memory operations (global memory)
- SALU: Scalar ALU operations (control flow, addresses)
- SMEM: Scalar memory operations (constants, metadata)
- LDS: Local data share operations (shared memory)
- Branch: Control flow instructions
Register Usage
- VGPRs: Vector general-purpose registers (per-thread data)
- SGPRs: Scalar general-purpose registers (shared data)
- LDS: Local data share memory (shared memory in CUDA terms)
Occupancy
Occupancy is limited by:- VGPR usage: More registers = fewer waves
- LDS usage: More shared memory = fewer waves
- Wave count: Maximum waves per workgroup
Optimization Tips
Reduce VGPR usage
Reduce VGPR usage
- Recompute values instead of storing
- Use smaller data types (fp16, bf16)
- Restructure code to reduce live variables
Improve memory access
Improve memory access
- Coalesce global memory accesses
- Use LDS for reused data
- Prefetch with
s_waitcntoptimization
Increase instruction-level parallelism
Increase instruction-level parallelism
- Interleave independent operations
- Reduce dependency chains
- Use vector operations where possible
Server-Side Analysis
Analyze ISA without local AMD hardware:- Developing on non-AMD machines
- Cross-compiling for different targets
- Comparing across GPU generations