GPU Specs
GPU specs define hardware characteristics used for roofline analysis, performance modeling, and comparison. Wafer includes built-in specs for common GPUs and supports custom configurations.
Quick Start
# List available specs
wafer config specs list
# Show spec details
wafer config specs show H100
# Set default spec
wafer config specs default H100
Commands
wafer config specs list
List all available GPU specifications:
Output:
GPU Specifications:
NVIDIA:
Name Memory BW (TB/s) FP16 (TFLOPS) TDP
H100 80GB 3.35 989.4 700W
H200 141GB 4.80 989.4 700W
A100 80GB 2.04 312.0 400W
B200 192GB 8.00 2250.0 1000W
RTX4090 24GB 1.01 165.2 450W
AMD:
Name Memory BW (TB/s) FP16 (TFLOPS) TDP
MI300X 192GB 5.30 1307.4 750W
MI250X 128GB 3.28 383.0 560W
MI210 64GB 1.64 181.0 300W
* = default
wafer config specs show
Show detailed specifications for a GPU:
wafer config specs show < gpu-nam e >
Example:
wafer config specs show H100
Output:
NVIDIA H100 SXM5 (80GB)
Memory:
Capacity: 80 GB HBM3
Bandwidth: 3.35 TB/s
Bus Width: 5120-bit
Compute:
FP64: 33.5 TFLOPS
FP32: 66.9 TFLOPS
FP16: 989.4 TFLOPS (Tensor Core)
BF16: 989.4 TFLOPS (Tensor Core)
INT8: 1978.9 TOPS (Tensor Core)
FP8: 1978.9 TFLOPS (Tensor Core)
Architecture:
SMs: 132
CUDA Cores: 16896
Tensor Cores: 528 (4th gen)
L2 Cache: 50 MB
Registers/SM: 65536
Roofline:
Ridge Point (FP16): 295.3 FLOP/byte
Ridge Point (FP32): 20.0 FLOP/byte
Power:
TDP: 700W
wafer config specs add
Add a custom GPU specification:
wafer config specs add < nam e > [OPTIONS]
Options:
Option Description --memoryMemory capacity (e.g., “80GB”) --bandwidthMemory bandwidth (e.g., “3.35TB/s”) --fp16FP16 peak TFLOPS --fp32FP32 peak TFLOPS --tdpThermal design power (watts)
Example:
wafer config specs add my-custom-gpu \
--memory 48GB \
--bandwidth 2.0TB/s \
--fp16 400 \
--fp32 200 \
--tdp 350
wafer config specs remove
Remove a custom specification:
wafer config specs remove < nam e >
Built-in GPU specs cannot be removed, only custom specs.
wafer config specs default
Set the default GPU for analysis:
wafer config specs default < nam e >
The default is used when no --gpu flag is specified:
# Set default
wafer config specs default H100
# Now roofline uses H100 automatically
wafer roofline --bytes 1e9 --flops 1e12 --time-ms 0.5
Using Specs
Roofline Analysis
# Use specific GPU
wafer roofline --gpu H100 --bytes 1e9 --flops 1e12 --time-ms 0.5
# Use default GPU
wafer roofline --bytes 1e9 --flops 1e12 --time-ms 0.5
Baseline Discovery
wafer baseline run "torch.matmul(A, B)" \
--shape A=1024,1024 \
--shape B=1024,1024 \
--hardware H100
Compare same workload across GPUs:
# Analyze for H100
wafer roofline --gpu H100 --bytes 2e9 --flops 4e12 --time-ms 1.0
# Analyze for A100
wafer roofline --gpu A100 --bytes 2e9 --flops 4e12 --time-ms 1.5
Built-in Specifications
NVIDIA GPUs
GPU Generation Memory Peak FP16 B200 Blackwell 192GB HBM3e 2250 TFLOPS H200 Hopper 141GB HBM3e 989 TFLOPS H100 Hopper 80GB HBM3 989 TFLOPS A100 Ampere 80GB HBM2e 312 TFLOPS RTX 4090 Ada 24GB GDDR6X 165 TFLOPS
AMD GPUs
GPU Generation Memory Peak FP16 MI300X CDNA 3 192GB HBM3 1307 TFLOPS MI250X CDNA 2 128GB HBM2e 383 TFLOPS MI210 CDNA 2 64GB HBM2e 181 TFLOPS
Custom Spec File
Create specs from a YAML file:
# my-gpu.yaml
name : Custom GPU
vendor : NVIDIA
memory :
capacity_gb : 80
bandwidth_tb_s : 3.0
compute :
fp16_tflops : 500
fp32_tflops : 250
fp64_tflops : 125
power :
tdp_watts : 400
wafer config specs add --from-file my-gpu.yaml
Next Steps
Roofline Analysis Use specs for roofline analysis.
Baseline Discovery Use specs with baseline.
Targets Configure GPU targets.
Workspaces Access cloud GPUs.