Matto

Recent Notes

About Me
Dec 23, 2025
How to navigate this space
Dec 23, 2025
- meta
The leather journal
Dec 23, 2025

❯

The sticky notes

❯

Unsorted links

Dec 23, 20251 min read

A history of Nvidia Stream Multiprocessor
Host overhead is killing your inference efficiency
How to think about GPUs
How to Think About TPUs
Making Deep Learning Go Brrrr From First Principles
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
TPU Deep Dive
The Illustrated Transformer
ML Engineering Open Book
CUDA for ML - Intuitively and Exhaustively Explained
Demystifying GPU Compute Architectures
Inside NVIDIA GPUs: Anatomy of high performance matmul kernels
Arm at Amazon
The Computer Architecture of AI (in 2024)
What Happened with FPGA Acceleration?
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
Getting Started with Intermittent Computing
Intermittent Computing: Challenges and Opportunities
GPU-Puzzles
Making GPUs Actually Fast: A Deep Dive into Training Performance
16 charts that explain the AI boom
Open Sustainable Technology
Is Parallel Programming Hard, And, If So, What Can You Do About It?
When a barrier does not block: The pitfalls of partial order
High Bandwidth Flash: NAND’s Bid for AI Memory
No Graphics API
2025 AI Concepts Recap
How to lower ping on playit

Graph View

Created with Quartz v4.5.2 © 2025