Matto

Recent Notes

  • AGENTS

    Feb 05, 2026

    • CLAUDE

      Feb 05, 2026

      • About Me

        Feb 03, 2026

        Home

        ❯

        The sticky notes

        ❯

        Unsorted links

        Unsorted links

        Feb 03, 20261 min read

        • A history of Nvidia Stream Multiprocessor
        • Host overhead is killing your inference efficiency
        • How to think about GPUs
        • How to Think About TPUs
        • Making Deep Learning Go Brrrr From First Principles
        • The Ultra-Scale Playbook: Training LLMs on GPU Clusters
        • TPU Deep Dive
        • The Illustrated Transformer
        • ML Engineering Open Book
        • CUDA for ML - Intuitively and Exhaustively Explained
        • Demystifying GPU Compute Architectures
        • Inside NVIDIA GPUs: Anatomy of high performance matmul kernels
        • Arm at Amazon
        • The Computer Architecture of AI (in 2024)
        • What Happened with FPGA Acceleration?
        • How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
        • Getting Started with Intermittent Computing
        • Intermittent Computing: Challenges and Opportunities
        • GPU-Puzzles
        • Making GPUs Actually Fast: A Deep Dive into Training Performance
        • 16 charts that explain the AI boom
        • Open Sustainable Technology
        • Is Parallel Programming Hard, And, If So, What Can You Do About It?
        • When a barrier does not block: The pitfalls of partial order
        • High Bandwidth Flash: NAND’s Bid for AI Memory
        • No Graphics API
        • 2025 AI Concepts Recap
        • How to lower ping on playit
        • How Peter Wildeford uses different LLMs
        • Use multiple models

        Graph View

        Created with Quartz v4.5.2 © 2026