Using GPU Docs
Learn how to ask effective questions and get the most out of the GPU Docs assistant.Asking Questions
Be Specific
Specific questions get better answers:- Good
- Too Vague
- “How do I use tcgen05 for matrix multiplication?”
- “What is the difference between
__shared__and__constant__memory?” - “How does warp-level matrix multiply work on Hopper?”
Include Context
When asking about specific code or scenarios, provide context:- Mention the GPU architecture you’re targeting
- Describe what you’re trying to achieve
- Reference specific functions or APIs you’re using
Example Questions
Here are some questions GPU Docs handles well:Memory and Architecture
Memory and Architecture
- “What is TMEM and when should I use it?”
- “Explain the memory hierarchy in Blackwell”
- “How does shared memory banking work?”
- “What’s the difference between L1 and L2 cache?”
PTX and Assembly
PTX and Assembly
- “What does the
mma.syncinstruction do?” - “How do I read SASS output for memory coalescing?”
- “Explain the
tcgen05instructions” - “What are the predicate registers in PTX?”
CUTLASS and CuTe
CUTLASS and CuTe
- “How do I implement a GEMM kernel with CUTLASS?”
- “What is a CuTe Layout?”
- “Explain the tile iterator pattern”
- “How do epilogue operations work?”
Optimization
Optimization
- “Why is my kernel memory-bound?”
- “How can I increase occupancy?”
- “What causes register spilling?”
- “How do I optimize shared memory access?”
Understanding Citations
Answers include citations to source documentation:Using Citations
- Click citations to open the source documentation
- Verify important claims by checking the original source
- Learn more by reading the full context around cited sections
Attaching Files
You can attach code files to get help with specific implementations:How to Attach Files
- Click the @ button in the message input
- Select a file from your workspace
- The file content is included with your question
When to Attach Files
- Debugging compilation errors
- Understanding why code produces certain assembly
- Getting optimization suggestions for existing code
- Asking “why does this code do X?”
Example with Attachment
Conversation History
GPU Docs maintains conversation context, so you can:Ask Follow-Up Questions
Build on Previous Answers
Start Fresh
To start a new conversation without previous context:- Click the New Chat button
- Or refresh the Wafer panel
Tips for Best Results
One Topic at a Time
Focus each question on a single concept. Multiple questions in one message may get incomplete answers.
Use Technical Terms
Use proper terminology (sm_90, mma.sync, shared) for more precise answers.
Follow Up
If an answer isn’t quite what you need, ask follow-up questions to clarify.
Check Citations
Always verify important information by following the cited sources.
Suggested Starting Questions
If you’re not sure what to ask, try these:- “What is TMEM?”
- “How do I use tcgen05 for matrix multiplication?”
- “Explain the memory hierarchy in Blackwell”
- “How do I implement a GEMM kernel?”
- “What does the mma.sync instruction do?”
Limitations
- Answers are based on indexed documentation—very recent changes may not be reflected
- Complex multi-step implementations may need to be broken into separate questions
- Always verify answers against official documentation for production use