Creating Traces for Perfetto
Perfetto can analyze traces from many sources. Here’s how to create traces that work with Wafer’s Perfetto viewer.Supported Formats
Wafer’s Perfetto viewer supports:| Format | Extension | Description |
|---|---|---|
| Chrome JSON | .json | Standard Chrome tracing format |
| Gzip Compressed | .json.gz, .gz | Compressed Chrome JSON traces |
| Perfetto Native | .perfetto-trace, .pftrace | Native Perfetto format |
Chrome JSON format is the most common and widely supported. Most profiling tools can export to this format.
Creating Chrome JSON Traces
From PyTorch
Use PyTorch’s built-in profiler to generate traces:From TensorFlow
TensorFlow can export traces directly:From Chrome DevTools
For web-based GPU work (WebGL, WebGPU):- Open Chrome DevTools (F12)
- Go to the Performance tab
- Click Record (⚫)
- Perform your operations
- Click Stop
- Click Save profile to export as JSON
From NVIDIA Tools
Using Nsight Systems
Nsight Systems can export to Chrome JSON:Using PyTorch + CUDA
PyTorch’s profiler captures CUDA events automatically:From Custom Instrumentation
You can create Chrome JSON traces manually. The format is simple:| Field | Description |
|---|---|
name | Event name |
cat | Category (for coloring/filtering) |
ph | Phase: X for complete events, B/E for begin/end pairs |
ts | Timestamp in microseconds |
dur | Duration in microseconds (for X events) |
pid | Process ID |
tid | Thread ID |
Compressing Traces
Large traces can be compressed to save space:Best Practices
Keep traces focused
Keep traces focused
Profile only the code you care about. A trace of your entire application startup will be harder to analyze than a trace of just the hot loop.
Include metadata
Include metadata
Add git commit hashes, configuration, and environment info to trace filenames or metadata. This makes it easier to reproduce and compare results.
Compress before uploading
Compress before uploading
Large traces take longer to upload and process. Compress with gzip first—Perfetto handles decompression automatically.
Profile realistic workloads
Profile realistic workloads
Use representative input sizes and data patterns. Profiling toy inputs may not reveal real-world bottlenecks.
Troubleshooting
Trace file is too large
Trace file is too large
Try reducing the profiling duration or scope. You can also use
gzip to compress the file—Perfetto reads .json.gz files directly.No CUDA events in trace
No CUDA events in trace
Make sure you’re including
ProfilerActivity.CUDA in your profiler configuration. Also ensure CUDA synchronization happens before the profiler context ends.Trace loads but shows no data
Trace loads but shows no data
Check that your trace has valid
traceEvents. Some tools export wrapper formats that need conversion.