Skip to main content

GPU Specs

GPU specs define hardware characteristics used for roofline analysis, performance modeling, and comparison. Wafer includes built-in specs for common GPUs and supports custom configurations.

Quick Start

# List available specs
wafer config specs list

# Show spec details
wafer config specs show H100

# Set default spec
wafer config specs default H100

Commands

wafer config specs list

List all available GPU specifications:
wafer config specs list
Output:
GPU Specifications:

NVIDIA:
  Name       Memory    BW (TB/s)   FP16 (TFLOPS)   TDP
  H100       80GB      3.35        989.4           700W
  H200       141GB     4.80        989.4           700W
  A100       80GB      2.04        312.0           400W
  B200       192GB     8.00        2250.0          1000W
  RTX4090    24GB      1.01        165.2           450W

AMD:
  Name       Memory    BW (TB/s)   FP16 (TFLOPS)   TDP
  MI300X     192GB     5.30        1307.4          750W
  MI250X     128GB     3.28        383.0           560W
  MI210      64GB      1.64        181.0           300W

* = default

wafer config specs show

Show detailed specifications for a GPU:
wafer config specs show <gpu-name>
Example:
wafer config specs show H100
Output:
NVIDIA H100 SXM5 (80GB)

Memory:
  Capacity: 80 GB HBM3
  Bandwidth: 3.35 TB/s
  Bus Width: 5120-bit

Compute:
  FP64: 33.5 TFLOPS
  FP32: 66.9 TFLOPS
  FP16: 989.4 TFLOPS (Tensor Core)
  BF16: 989.4 TFLOPS (Tensor Core)
  INT8: 1978.9 TOPS (Tensor Core)
  FP8: 1978.9 TFLOPS (Tensor Core)

Architecture:
  SMs: 132
  CUDA Cores: 16896
  Tensor Cores: 528 (4th gen)
  L2 Cache: 50 MB
  Registers/SM: 65536

Roofline:
  Ridge Point (FP16): 295.3 FLOP/byte
  Ridge Point (FP32): 20.0 FLOP/byte

Power:
  TDP: 700W

wafer config specs add

Add a custom GPU specification:
wafer config specs add <name> [OPTIONS]
Options:
OptionDescription
--memoryMemory capacity (e.g., “80GB”)
--bandwidthMemory bandwidth (e.g., “3.35TB/s”)
--fp16FP16 peak TFLOPS
--fp32FP32 peak TFLOPS
--tdpThermal design power (watts)
Example:
wafer config specs add my-custom-gpu \
  --memory 48GB \
  --bandwidth 2.0TB/s \
  --fp16 400 \
  --fp32 200 \
  --tdp 350

wafer config specs remove

Remove a custom specification:
wafer config specs remove <name>
Built-in GPU specs cannot be removed, only custom specs.

wafer config specs default

Set the default GPU for analysis:
wafer config specs default <name>
The default is used when no --gpu flag is specified:
# Set default
wafer config specs default H100

# Now roofline uses H100 automatically
wafer roofline --bytes 1e9 --flops 1e12 --time-ms 0.5

Using Specs

Roofline Analysis

# Use specific GPU
wafer roofline --gpu H100 --bytes 1e9 --flops 1e12 --time-ms 0.5

# Use default GPU
wafer roofline --bytes 1e9 --flops 1e12 --time-ms 0.5

Baseline Discovery

wafer baseline run "torch.matmul(A, B)" \
  --shape A=1024,1024 \
  --shape B=1024,1024 \
  --hardware H100

Performance Comparison

Compare same workload across GPUs:
# Analyze for H100
wafer roofline --gpu H100 --bytes 2e9 --flops 4e12 --time-ms 1.0

# Analyze for A100
wafer roofline --gpu A100 --bytes 2e9 --flops 4e12 --time-ms 1.5

Built-in Specifications

NVIDIA GPUs

GPUGenerationMemoryPeak FP16
B200Blackwell192GB HBM3e2250 TFLOPS
H200Hopper141GB HBM3e989 TFLOPS
H100Hopper80GB HBM3989 TFLOPS
A100Ampere80GB HBM2e312 TFLOPS
RTX 4090Ada24GB GDDR6X165 TFLOPS

AMD GPUs

GPUGenerationMemoryPeak FP16
MI300XCDNA 3192GB HBM31307 TFLOPS
MI250XCDNA 2128GB HBM2e383 TFLOPS
MI210CDNA 264GB HBM2e181 TFLOPS

Custom Spec File

Create specs from a YAML file:
# my-gpu.yaml
name: Custom GPU
vendor: NVIDIA
memory:
  capacity_gb: 80
  bandwidth_tb_s: 3.0
compute:
  fp16_tflops: 500
  fp32_tflops: 250
  fp64_tflops: 125
power:
  tdp_watts: 400
wafer config specs add --from-file my-gpu.yaml

Next Steps