Skip to main content

Daily Kernel

The Daily Kernel is a daily CUDA programming challenge that helps you practice GPU programming skills. Each day features a new problem with varying difficulty levels.

What is it?

Similar to daily coding challenges you might find on other platforms, the Daily Kernel presents GPU-specific problems:
  • Kernel optimization — Make a kernel faster
  • Algorithm implementation — Implement GPU-friendly algorithms
  • Memory patterns — Work with different memory types
  • Parallel primitives — Reductions, scans, and more

Accessing Daily Kernel

  1. Click the puzzle icon (⚡) in the Wafer top bar, or
  2. Select Daily Kernel from the tool dropdown

Challenge Structure

Each challenge includes:

Problem Statement

A description of what you need to implement or optimize, including:
  • Input/output specifications
  • Performance requirements
  • Constraints

Examples

Concrete examples showing:
  • Sample inputs
  • Expected outputs
  • Explanations of the expected behavior

Framework Selection

Choose your preferred implementation framework:
  • CuTe DSL — Modern C++ DSL for tensor operations
  • CUDA — Standard CUDA C++
Different frameworks may have different starter code and hints tailored to that approach.

Starter Code

Template code to get you started:
  • Function signatures
  • Memory setup
  • Basic structure

Kernel Signature

For kernels, you’ll see:
  • Input tensors (names, types, shapes)
  • Output tensors
  • Any scalar parameters

Constraints

Problem constraints to keep in mind:
  • Input sizes
  • Performance targets
  • Memory limits

Hints

Collapsible hints if you get stuck:
  • Algorithmic approaches
  • Framework-specific tips
  • Common pitfalls to avoid

Starting a Challenge

1

Read the Problem

Understand what you need to implement. Pay attention to input/output specs and constraints.
2

Choose a Framework

Select CuTe DSL or CUDA based on your preference and the problem type.
3

Review Starter Code

Look at the provided template to understand the expected structure.
4

Click Start Challenge

This creates a new file in your workspace with the starter code.
5

Implement Your Solution

Write your kernel implementation in the created file.

Difficulty Levels

LevelDescription
EasyStraightforward implementations, good for learning basics
MediumRequires optimization or non-trivial algorithms
HardComplex problems requiring advanced techniques

Tips for Success

Get a correct solution first, then optimize. Don’t try to write the fastest solution immediately.
Check the PTX/SASS output of your kernel to understand what’s happening at the instruction level.
Use the NCU Profiler to identify bottlenecks in your solution.
If you’re stuck on a concept, ask the GPU Docs assistant for help.

Challenge History

Previous challenges remain accessible through the challenge archive. Practice old challenges to build your skills before tackling new ones.