Skip to main content

Creating Traces for Perfetto

Perfetto can analyze traces from many sources. Here’s how to create traces that work with Wafer’s Perfetto viewer.

Supported Formats

Wafer’s Perfetto viewer supports:
FormatExtensionDescription
Chrome JSON.jsonStandard Chrome tracing format
Gzip Compressed.json.gz, .gzCompressed Chrome JSON traces
Perfetto Native.perfetto-trace, .pftraceNative Perfetto format
Chrome JSON format is the most common and widely supported. Most profiling tools can export to this format.

Creating Chrome JSON Traces

From PyTorch

Use PyTorch’s built-in profiler to generate traces:
import torch
from torch.profiler import profile, ProfilerActivity

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    record_shapes=True,
    with_stack=True,
) as prof:
    # Your model code here
    model(input)

# Export to Chrome JSON format
prof.export_chrome_trace("trace.json")

From TensorFlow

TensorFlow can export traces directly:
import tensorflow as tf

# Enable tracing
tf.profiler.experimental.start('logdir')

# Your model code here
model(input)

# Stop and save
tf.profiler.experimental.stop()
Then convert the TensorBoard logs to Chrome JSON using TensorBoard.

From Chrome DevTools

For web-based GPU work (WebGL, WebGPU):
  1. Open Chrome DevTools (F12)
  2. Go to the Performance tab
  3. Click Record (⚫)
  4. Perform your operations
  5. Click Stop
  6. Click Save profile to export as JSON

From NVIDIA Tools

Using Nsight Systems

Nsight Systems can export to Chrome JSON:
# Profile and generate a .nsys-rep file
nsys profile -o output ./my_app

# Export to JSON (requires nsys-exporter)
nsys export --type=json --output=trace.json output.nsys-rep

Using PyTorch + CUDA

PyTorch’s profiler captures CUDA events automatically:
from torch.profiler import profile, ProfilerActivity

with profile(
    activities=[
        ProfilerActivity.CPU,
        ProfilerActivity.CUDA,
    ],
    with_stack=True,
    profile_memory=True,
) as prof:
    model(input)

prof.export_chrome_trace("cuda_trace.json")

From Custom Instrumentation

You can create Chrome JSON traces manually. The format is simple:
{
  "traceEvents": [
    {
      "name": "MyFunction",
      "cat": "kernel",
      "ph": "X",
      "ts": 1000,
      "dur": 500,
      "pid": 1,
      "tid": 1
    }
  ]
}
FieldDescription
nameEvent name
catCategory (for coloring/filtering)
phPhase: X for complete events, B/E for begin/end pairs
tsTimestamp in microseconds
durDuration in microseconds (for X events)
pidProcess ID
tidThread ID

Compressing Traces

Large traces can be compressed to save space:
# Compress with gzip
gzip trace.json

# This creates trace.json.gz
# Perfetto can read it directly
Traces can get very large (100MB+). Always compress traces before storing or sharing them. Wafer’s Perfetto viewer handles compressed files natively.

Best Practices

Profile only the code you care about. A trace of your entire application startup will be harder to analyze than a trace of just the hot loop.
Add git commit hashes, configuration, and environment info to trace filenames or metadata. This makes it easier to reproduce and compare results.
Large traces take longer to upload and process. Compress with gzip first—Perfetto handles decompression automatically.
Use representative input sizes and data patterns. Profiling toy inputs may not reveal real-world bottlenecks.

Troubleshooting

Try reducing the profiling duration or scope. You can also use gzip to compress the file—Perfetto reads .json.gz files directly.
Make sure you’re including ProfilerActivity.CUDA in your profiler configuration. Also ensure CUDA synchronization happens before the profiler context ends.
Check that your trace has valid traceEvents. Some tools export wrapper formats that need conversion.

Next Steps