Matto

Recent Notes

  • GGML Library

    Apr 21, 2026

    • ggml
  • GGML Tensor Shape

    Apr 21, 2026

    • ggml
  • GGML Tensor

    Apr 21, 2026

    • ggml
Home

❯

Notes

❯

Unsorted links

Unsorted links

Feb 03, 20262 min read

  • A history of Nvidia Stream Multiprocessor
  • Host overhead is killing your inference efficiency
  • How to think about GPUs
  • How to Think About TPUs
  • Making Deep Learning Go Brrrr From First Principles
  • The Ultra-Scale Playbook: Training LLMs on GPU Clusters
  • TPU Deep Dive
  • The Illustrated Transformer
  • ML Engineering Open Book
  • CUDA for ML - Intuitively and Exhaustively Explained
  • Demystifying GPU Compute Architectures
  • Inside NVIDIA GPUs: Anatomy of high performance matmul kernels
  • Arm at Amazon
  • The Computer Architecture of AI (in 2024)
  • What Happened with FPGA Acceleration?
  • How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
  • Getting Started with Intermittent Computing
  • Intermittent Computing: Challenges and Opportunities
  • GPU-Puzzles
  • Making GPUs Actually Fast: A Deep Dive into Training Performance
  • 16 charts that explain the AI boom
  • Open Sustainable Technology
  • Is Parallel Programming Hard, And, If So, What Can You Do About It?
  • When a barrier does not block: The pitfalls of partial order
  • High Bandwidth Flash: NAND’s Bid for AI Memory
  • No Graphics API
  • What I have been reading: What is a ml compiler
  • Why are ML Compilers so Hard?
  • How LLMs are trained for function calling
  • Understanding Reasoning LLMs
  • A Decade of Residuals: History & Effects on modern ML
  • A Gentle Introduction to Distributed Training
  • FPGA Internal Arch
  • https://frankdenneman.ai/ai-infrastructure/
  • https://tinyblog-phi.vercel.app/tinygrad

Graph View

Created with Quartz v4.5.2 © 2026