Skip to main content

Infrastructure

Wafer provides multiple ways to access GPU resources for development, profiling, and kernel evaluation. Choose the option that best fits your workflow.

Options

Choosing an Option

NeedBest Option
Quick GPU access, no setupWorkspaces
Use existing machinesTargets (SSH)
Local developmentTargets (local)
Cloud provider creditsTargets (RunPod/DigitalOcean)
Performance modelingGPU Specs

Quick Comparison

Workspaces

# Create and use immediately
wafer workspaces create --gpu H100 --name dev
wafer workspaces ssh dev
Pros:
  • No infrastructure management
  • Pre-configured environments
  • Automatic cleanup
Cons:
  • Billed by usage
  • Less customization
  • Requires internet

Targets

# Connect existing machines
wafer config targets init ssh
wafer evaluate gpumode --impl kernel.py --target my-server
Pros:
  • Use any GPU you have access to
  • Full control
  • Works with existing setups
Cons:
  • Requires setup
  • You manage the infrastructure

Getting Started

1

Choose Your Approach

For quick starts, use workspaces. For existing infrastructure, use targets.
2

Authenticate

Sign in to access cloud resources:
wafer auth login
3

Configure

Set up workspaces or targets:
# Workspaces
wafer workspaces create --gpu H100

# Or targets
wafer config targets init local
4

Use

Run commands on your GPU:
wafer evaluate gpumode --impl kernel.py --target my-target

Remote Execution

Both workspaces and targets support remote command execution:
# Workspace
wafer workspaces exec my-workspace "python train.py"

# Target
wafer run "python train.py" --target my-target

File Synchronization

Sync code to remote environments:
# Workspace
wafer workspaces sync my-workspace

# Target (via exec)
wafer config targets sync my-target ./src /remote/src

Common Workflows

Kernel Development

  1. Develop locally
  2. Sync to remote GPU
  3. Evaluate and profile
  4. Pull results back
wafer workspaces sync dev
wafer workspaces exec dev "wafer evaluate gpumode --impl kernel.py"
wafer workspaces pull dev --remote /results --local ./results

Profiling on Different GPUs

# Profile on H100
wafer nvidia nsys profile --target h100-server "python train.py"

# Profile on MI300X
wafer amd rocprof-systems run --target mi300x-server "python train.py"

# Compare
wafer compare analyze ./h100.nsys-rep ./mi300x.json

Distributed Training

# Create multi-GPU workspace
wafer workspaces create --gpu A100 --count 4 --name training

# Run distributed
wafer workspaces exec training "torchrun --nproc_per_node=4 train.py"

Billing

Workspace usage is billed by compute time:
# Check credits
wafer config billing portal

# Add credits
wafer config billing topup
Target usage depends on your own infrastructure costs.

Next Steps