Cross-Platform Comparison
Thewafer compare command analyzes and compares GPU traces across different platforms. Compare NVIDIA and AMD implementations, align kernel executions, and understand performance differences.
Quick Start
Commands
wafer compare analyze
Compare GPU traces from different sources:| Option | Description |
|---|---|
--output, -o | Output file path |
--format | Output format: text, json, html |
--metric | Comparison metric: time, throughput, efficiency |
wafer compare fusion
Compare fusion patterns between implementations:| Option | Description |
|---|---|
--output, -o | Output file path |
--threshold | Time threshold for fusion detection |
wafer compare align
Align kernel executions between traces:| Option | Description |
|---|---|
--output, -o | Output file path |
--method | Alignment method: name, sequence, hybrid |
Supported Trace Formats
| Format | Extension | Platform |
|---|---|---|
| Nsight Systems | .nsys-rep | NVIDIA |
| NCU Report | .ncu-rep | NVIDIA |
| ROCprof Systems | .json | AMD |
| ROCprof Compute | .csv | AMD |
| Perfetto | .perfetto, .pftrace | Both |
| PyTorch Trace | .json | Both |
Use Cases
NVIDIA vs AMD Comparison
Compare the same workload on different GPUs:Optimization Validation
Verify optimizations work across platforms:Kernel Matching
Understand how similar operations map between platforms:Interpretation
Performance Ratio
- < 1.0: Trace B is faster
- = 1.0: Equal performance
- > 1.0: Trace A is faster
Common Differences
| Observation | Likely Cause |
|---|---|
| Different kernel counts | Library differences (cuBLAS vs rocBLAS) |
| Large time variance | Different fusion strategies |
| Unmatched kernels | Platform-specific optimizations |
| Memory time difference | Different memory subsystems |