Matto

Recent Notes

  • About Me

    Dec 23, 2025

    • How to navigate this space

      Dec 23, 2025

      • meta
    • The leather journal

      Dec 23, 2025

      Home

      ❯

      The sticky notes

      ❯

      Unsorted links

      Unsorted links

      Dec 23, 20251 min read

      • A history of Nvidia Stream Multiprocessor
      • Host overhead is killing your inference efficiency
      • How to think about GPUs
      • How to Think About TPUs
      • Making Deep Learning Go Brrrr From First Principles
      • The Ultra-Scale Playbook: Training LLMs on GPU Clusters
      • TPU Deep Dive
      • The Illustrated Transformer
      • ML Engineering Open Book
      • CUDA for ML - Intuitively and Exhaustively Explained
      • Demystifying GPU Compute Architectures
      • Inside NVIDIA GPUs: Anatomy of high performance matmul kernels
      • Arm at Amazon
      • The Computer Architecture of AI (in 2024)
      • What Happened with FPGA Acceleration?
      • How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
      • Getting Started with Intermittent Computing
      • Intermittent Computing: Challenges and Opportunities
      • GPU-Puzzles
      • Making GPUs Actually Fast: A Deep Dive into Training Performance
      • 16 charts that explain the AI boom
      • Open Sustainable Technology
      • Is Parallel Programming Hard, And, If So, What Can You Do About It?
      • When a barrier does not block: The pitfalls of partial order
      • High Bandwidth Flash: NAND’s Bid for AI Memory
      • No Graphics API
      • 2025 AI Concepts Recap
      • How to lower ping on playit

      Graph View

      Created with Quartz v4.5.2 © 2025