Using GPU Docs

Learn how to ask effective questions and get the most out of the GPU Docs assistant.

Asking Questions

Be Specific

Specific questions get better answers:

Good
Too Vague

“How do I use tcgen05 for matrix multiplication?”
“What is the difference between __shared__ and __constant__ memory?”
“How does warp-level matrix multiply work on Hopper?”

Include Context

When asking about specific code or scenarios, provide context:

Mention the GPU architecture you’re targeting
Describe what you’re trying to achieve
Reference specific functions or APIs you’re using

Example Questions

Here are some questions GPU Docs handles well:

Memory and Architecture

“What is TMEM and when should I use it?”
“Explain the memory hierarchy in Blackwell”
“How does shared memory banking work?”
“What’s the difference between L1 and L2 cache?”

PTX and Assembly

“What does the mma.sync instruction do?”
“How do I read SASS output for memory coalescing?”
“Explain the tcgen05 instructions”
“What are the predicate registers in PTX?”

CUTLASS and CuTe

“How do I implement a GEMM kernel with CUTLASS?”
“What is a CuTe Layout?”
“Explain the tile iterator pattern”
“How do epilogue operations work?”

Optimization

“Why is my kernel memory-bound?”
“How can I increase occupancy?”
“What causes register spilling?”
“How do I optimize shared memory access?”

Understanding Citations

Answers include citations to source documentation:

The tensor core operates on matrix fragments using the mma.sync 
instruction [1]. Each warp collaborates to compute a matrix 
multiply-accumulate operation [2].

[1] PTX ISA Reference - Matrix Operations
[2] CUDA C++ Programming Guide - Warp Matrix Functions

Using Citations

Click citations to open the source documentation
Verify important claims by checking the original source
Learn more by reading the full context around cited sections

Citations help you build deeper understanding. Don’t just take answers at face value—follow the links to learn the underlying concepts.

Attaching Files

You can attach code files to get help with specific implementations:

How to Attach Files

Click the @ button in the message input
Select a file from your workspace
The file content is included with your question

When to Attach Files

Debugging compilation errors
Understanding why code produces certain assembly
Getting optimization suggestions for existing code
Asking “why does this code do X?”

Example with Attachment

I attached my GEMM kernel. Why am I getting low occupancy?

The assistant can analyze your code and provide specific recommendations.

Conversation History

GPU Docs maintains conversation context, so you can:

Ask Follow-Up Questions

User: What is TMEM?
Assistant: TMEM is the Tensor Memory... [explanation]

User: How do I allocate it?
Assistant: [continues with allocation details, knowing you're asking about TMEM]

Build on Previous Answers

User: Explain the Hopper memory hierarchy
Assistant: [detailed explanation]

User: How does that differ from Ampere?
Assistant: [comparison, referencing the previous explanation]

Start Fresh

To start a new conversation without previous context:

Click the New Chat button
Or refresh the Wafer panel

Tips for Best Results

One Topic at a Time

Focus each question on a single concept. Multiple questions in one message may get incomplete answers.

Use Technical Terms

Use proper terminology (sm_90, mma.sync, shared) for more precise answers.

Follow Up

If an answer isn’t quite what you need, ask follow-up questions to clarify.

Check Citations

Always verify important information by following the cited sources.

Limitations

GPU Docs is a documentation assistant, not a code generator. For code generation and agentic tasks, use the Wevin agent mode (available to users with the agent feature enabled).

Answers are based on indexed documentation—very recent changes may not be reflected
Complex multi-step implementations may need to be broken into separate questions
Always verify answers against official documentation for production use

Getting Started

NCU Profiler

Compiler Explorer

GPU Docs

More

Using GPU Docs

Using GPU Docs

Asking Questions

Be Specific

Include Context

Example Questions

Understanding Citations

Using Citations

Attaching Files

How to Attach Files

When to Attach Files

Example with Attachment

Conversation History

Ask Follow-Up Questions

Build on Previous Answers

Start Fresh

Tips for Best Results

One Topic at a Time

Use Technical Terms

Follow Up

Check Citations

Suggested Starting Questions

Limitations

Next Steps

NCU Profiler

Compiler Explorer

Getting Started

NCU Profiler

Compiler Explorer

GPU Docs

More

​Using GPU Docs

​Asking Questions

​Be Specific

​Include Context

​Example Questions

​Understanding Citations

​Using Citations

​Attaching Files

​How to Attach Files

​When to Attach Files

​Example with Attachment

​Conversation History

​Ask Follow-Up Questions

​Build on Previous Answers

​Start Fresh

​Tips for Best Results

One Topic at a Time

Use Technical Terms

Follow Up

Check Citations

​Suggested Starting Questions

​Limitations

​Next Steps

NCU Profiler

Compiler Explorer

Using GPU Docs

Asking Questions

Be Specific

Include Context

Example Questions

Understanding Citations

Using Citations

Attaching Files

How to Attach Files

When to Attach Files

Example with Attachment

Conversation History

Ask Follow-Up Questions

Build on Previous Answers

Start Fresh

Tips for Best Results

Suggested Starting Questions

Limitations

Next Steps