CUDA assumes a heterogeneous computing system (CPU and GPU).

# Example diagram of CUDA system
 
     PCIe/NVLink
CPU ------------- GPU
 |                 |
 |                 |
DRAM              HBM

Computations on the GPU (kernel) are always initiated by the CPU.

The orchestration of computation is also managed by the CPU.

Threads are organized into blocks and grids.