- A history of Nvidia Stream Multiprocessor
- Host overhead is killing your inference efficiency
- How to think about GPUs
- How to Think About TPUs
- Making Deep Learning Go Brrrr From First Principles
- The Ultra-Scale Playbook: Training LLMs on GPU Clusters
- TPU Deep Dive
- The Illustrated Transformer
- ML Engineering Open Book
- CUDA for ML - Intuitively and Exhaustively Explained
- Demystifying GPU Compute Architectures
- Inside NVIDIA GPUs: Anatomy of high performance matmul kernels
- Arm at Amazon
- The Computer Architecture of AI (in 2024)
- What Happened with FPGA Acceleration?
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
- Getting Started with Intermittent Computing
- Intermittent Computing: Challenges and Opportunities
- GPU-Puzzles
- Making GPUs Actually Fast: A Deep Dive into Training Performance
- 16 charts that explain the AI boom
- Open Sustainable Technology
- Is Parallel Programming Hard, And, If So, What Can You Do About It?
- When a barrier does not block: The pitfalls of partial order
- High Bandwidth Flash: NAND’s Bid for AI Memory
- No Graphics API
- 2025 AI Concepts Recap
- How to lower ping on playit