GPU Specs
GPU specs define hardware characteristics used for roofline analysis, performance modeling, and comparison. Wafer includes built-in specs for common GPUs and supports custom configurations.
Quick Start
# List available specs
wafer config specs list
# Show spec details
wafer config specs show H100
# Set default spec
wafer config specs default H100
Commands
wafer config specs list
List all available GPU specifications:
Output:
GPU Specifications:
NVIDIA:
Name Memory BW (TB/s) FP16 (TFLOPS) TDP
H100 80GB 3.35 989.4 700W
H200 141GB 4.80 989.4 700W
A100 80GB 2.04 312.0 400W
B200 192GB 8.00 2250.0 1000W
RTX4090 24GB 1.01 165.2 450W
AMD:
Name Memory BW (TB/s) FP16 (TFLOPS) TDP
MI300X 192GB 5.30 1307.4 750W
MI250X 128GB 3.28 383.0 560W
MI210 64GB 1.64 181.0 300W
* = default
wafer config specs show
Show detailed specifications for a GPU:
wafer config specs show <gpu-name>
Example:
wafer config specs show H100
Output:
NVIDIA H100 SXM5 (80GB)
Memory:
Capacity: 80 GB HBM3
Bandwidth: 3.35 TB/s
Bus Width: 5120-bit
Compute:
FP64: 33.5 TFLOPS
FP32: 66.9 TFLOPS
FP16: 989.4 TFLOPS (Tensor Core)
BF16: 989.4 TFLOPS (Tensor Core)
INT8: 1978.9 TOPS (Tensor Core)
FP8: 1978.9 TFLOPS (Tensor Core)
Architecture:
SMs: 132
CUDA Cores: 16896
Tensor Cores: 528 (4th gen)
L2 Cache: 50 MB
Registers/SM: 65536
Roofline:
Ridge Point (FP16): 295.3 FLOP/byte
Ridge Point (FP32): 20.0 FLOP/byte
Power:
TDP: 700W
wafer config specs add
Add a custom GPU specification:
wafer config specs add <name> [OPTIONS]
Options:
| Option | Description |
|---|
--memory | Memory capacity (e.g., “80GB”) |
--bandwidth | Memory bandwidth (e.g., “3.35TB/s”) |
--fp16 | FP16 peak TFLOPS |
--fp32 | FP32 peak TFLOPS |
--tdp | Thermal design power (watts) |
Example:
wafer config specs add my-custom-gpu \
--memory 48GB \
--bandwidth 2.0TB/s \
--fp16 400 \
--fp32 200 \
--tdp 350
wafer config specs remove
Remove a custom specification:
wafer config specs remove <name>
Built-in GPU specs cannot be removed, only custom specs.
wafer config specs default
Set the default GPU for analysis:
wafer config specs default <name>
The default is used when no --gpu flag is specified:
# Set default
wafer config specs default H100
# Now roofline uses H100 automatically
wafer roofline --bytes 1e9 --flops 1e12 --time-ms 0.5
Using Specs
Roofline Analysis
# Use specific GPU
wafer roofline --gpu H100 --bytes 1e9 --flops 1e12 --time-ms 0.5
# Use default GPU
wafer roofline --bytes 1e9 --flops 1e12 --time-ms 0.5
Baseline Discovery
wafer baseline run "torch.matmul(A, B)" \
--shape A=1024,1024 \
--shape B=1024,1024 \
--hardware H100
Compare same workload across GPUs:
# Analyze for H100
wafer roofline --gpu H100 --bytes 2e9 --flops 4e12 --time-ms 1.0
# Analyze for A100
wafer roofline --gpu A100 --bytes 2e9 --flops 4e12 --time-ms 1.5
Built-in Specifications
NVIDIA GPUs
| GPU | Generation | Memory | Peak FP16 |
|---|
| B200 | Blackwell | 192GB HBM3e | 2250 TFLOPS |
| H200 | Hopper | 141GB HBM3e | 989 TFLOPS |
| H100 | Hopper | 80GB HBM3 | 989 TFLOPS |
| A100 | Ampere | 80GB HBM2e | 312 TFLOPS |
| RTX 4090 | Ada | 24GB GDDR6X | 165 TFLOPS |
AMD GPUs
| GPU | Generation | Memory | Peak FP16 |
|---|
| MI300X | CDNA 3 | 192GB HBM3 | 1307 TFLOPS |
| MI250X | CDNA 2 | 128GB HBM2e | 383 TFLOPS |
| MI210 | CDNA 2 | 64GB HBM2e | 181 TFLOPS |
Custom Spec File
Create specs from a YAML file:
# my-gpu.yaml
name: Custom GPU
vendor: NVIDIA
memory:
capacity_gb: 80
bandwidth_tb_s: 3.0
compute:
fp16_tflops: 500
fp32_tflops: 250
fp64_tflops: 125
power:
tdp_watts: 400
wafer config specs add --from-file my-gpu.yaml
Next Steps