Skip to main content

GPU Targets

Targets let you run Wafer commands on remote GPUs—your own machines, cloud instances, or managed services. Configure once, then use for evaluation, profiling, and development.

Quick Start

# Initialize local GPU target
wafer config targets init local

# Add an SSH target
wafer config targets init ssh

# List configured targets
wafer config targets list

# Test connectivity
wafer config targets probe my-target

Target Types

Wafer supports several target types:
Use GPUs on your local machine:
wafer config targets init local
This detects NVIDIA or AMD GPUs automatically.

Commands

wafer config targets list

List all configured targets:
wafer config targets list
Output:
NAME          TYPE    GPU       STATUS
local         local   RTX 4090  ready
h100-box      ssh     H100      ready
runpod-a100   runpod  A100      stopped

wafer config targets init

Initialize a new target interactively:
wafer config targets init <type>
Where <type> is one of: local, ssh, runpod, digitalocean

wafer config targets add

Add a target with explicit configuration:
wafer config targets add <name> \
  --type ssh \
  --host 192.168.1.100 \
  --user ubuntu \
  --gpu H100

wafer config targets show

Show target configuration:
wafer config targets show <name>

wafer config targets probe

Test connectivity and GPU detection:
wafer config targets probe <name>
This verifies:
  • SSH connection (for remote targets)
  • GPU availability
  • Required tools installed

wafer config targets remove

Remove a target:
wafer config targets remove <name>

wafer config targets default

Set the default target:
wafer config targets default <name>
The default target is used when no --target flag is specified.

wafer config targets install

Install Wafer dependencies on a target:
wafer config targets install <name>
This installs Python packages and tools needed for remote execution.

Using Targets

Once configured, use targets with other commands:
# Evaluate on a specific target
wafer evaluate gpumode --impl kernel.py --target h100-box

# Run baseline on remote GPU
wafer baseline run "torch.matmul(A, B)" --target my-target

# Profile remotely
wafer nvidia ncu analyze ./profile.ncu-rep --target local

SSH Target Configuration

For SSH targets, ensure:
  1. SSH key authentication is set up (password auth not recommended)
  2. NVIDIA/AMD drivers are installed on the remote machine
  3. Python 3.8+ is available
  4. CUDA/ROCm toolkit is installed (for profiling)
Example manual target config:
wafer config targets add gpu-server \
  --type ssh \
  --host gpu.example.com \
  --user ubuntu \
  --key ~/.ssh/gpu_key \
  --gpu "NVIDIA H100" \
  --cuda-path /usr/local/cuda

Target Pools

Group multiple targets for load balancing:
# List pools
wafer config targets pool-list

# Create a pool
wafer config targets pool-create training-pool --targets h100-1,h100-2,h100-3

# Check pool status
wafer config targets pool-status training-pool

Troubleshooting

Check:
  • SSH connectivity: ssh user@host
  • GPU drivers: nvidia-smi or rocm-smi
  • Python availability: python --version
Increase timeout or check network connectivity:
wafer config targets probe my-target --timeout 60
Ensure drivers are installed and GPU is visible:
# For NVIDIA
ssh user@host nvidia-smi

# For AMD
ssh user@host rocm-smi

Next Steps