Skip to main content

Using GPU Docs

Learn how to ask effective questions and get the most out of the GPU Docs assistant.

Asking Questions

Be Specific

Specific questions get better answers:
  • “How do I use tcgen05 for matrix multiplication?”
  • “What is the difference between __shared__ and __constant__ memory?”
  • “How does warp-level matrix multiply work on Hopper?”

Include Context

When asking about specific code or scenarios, provide context:
  • Mention the GPU architecture you’re targeting
  • Describe what you’re trying to achieve
  • Reference specific functions or APIs you’re using

Example Questions

Here are some questions GPU Docs handles well:
  • “What is TMEM and when should I use it?”
  • “Explain the memory hierarchy in Blackwell”
  • “How does shared memory banking work?”
  • “What’s the difference between L1 and L2 cache?”
  • “What does the mma.sync instruction do?”
  • “How do I read SASS output for memory coalescing?”
  • “Explain the tcgen05 instructions”
  • “What are the predicate registers in PTX?”
  • “How do I implement a GEMM kernel with CUTLASS?”
  • “What is a CuTe Layout?”
  • “Explain the tile iterator pattern”
  • “How do epilogue operations work?”
  • “Why is my kernel memory-bound?”
  • “How can I increase occupancy?”
  • “What causes register spilling?”
  • “How do I optimize shared memory access?”

Understanding Citations

Answers include citations to source documentation:
The tensor core operates on matrix fragments using the mma.sync 
instruction [1]. Each warp collaborates to compute a matrix 
multiply-accumulate operation [2].

[1] PTX ISA Reference - Matrix Operations
[2] CUDA C++ Programming Guide - Warp Matrix Functions

Using Citations

  • Click citations to open the source documentation
  • Verify important claims by checking the original source
  • Learn more by reading the full context around cited sections
Citations help you build deeper understanding. Don’t just take answers at face value—follow the links to learn the underlying concepts.

Attaching Files

You can attach code files to get help with specific implementations:

How to Attach Files

  1. Click the @ button in the message input
  2. Select a file from your workspace
  3. The file content is included with your question

When to Attach Files

  • Debugging compilation errors
  • Understanding why code produces certain assembly
  • Getting optimization suggestions for existing code
  • Asking “why does this code do X?”

Example with Attachment

I attached my GEMM kernel. Why am I getting low occupancy?
The assistant can analyze your code and provide specific recommendations.

Conversation History

GPU Docs maintains conversation context, so you can:

Ask Follow-Up Questions

User: What is TMEM?
Assistant: TMEM is the Tensor Memory... [explanation]

User: How do I allocate it?
Assistant: [continues with allocation details, knowing you're asking about TMEM]

Build on Previous Answers

User: Explain the Hopper memory hierarchy
Assistant: [detailed explanation]

User: How does that differ from Ampere?
Assistant: [comparison, referencing the previous explanation]

Start Fresh

To start a new conversation without previous context:
  • Click the New Chat button
  • Or refresh the Wafer panel

Tips for Best Results

One Topic at a Time

Focus each question on a single concept. Multiple questions in one message may get incomplete answers.

Use Technical Terms

Use proper terminology (sm_90, mma.sync, shared) for more precise answers.

Follow Up

If an answer isn’t quite what you need, ask follow-up questions to clarify.

Check Citations

Always verify important information by following the cited sources.

Suggested Starting Questions

If you’re not sure what to ask, try these:
  • “What is TMEM?”
  • “How do I use tcgen05 for matrix multiplication?”
  • “Explain the memory hierarchy in Blackwell”
  • “How do I implement a GEMM kernel?”
  • “What does the mma.sync instruction do?”

Limitations

GPU Docs is a documentation assistant, not a code generator. For code generation and agentic tasks, use the Wevin agent mode (available to users with the agent feature enabled).
  • Answers are based on indexed documentation—very recent changes may not be reflected
  • Complex multi-step implementations may need to be broken into separate questions
  • Always verify answers against official documentation for production use

Next Steps