GGML Library

Purpose

My notes on the Georgi Gerganov Machine Learning (GGML) library, which is the bedrock of llama.cpp. The focus of this study is on the components necessary for inference.

AI Use Note

While my learning was assisted by Opus 4.7 (through Claude Code), my writing was done entirely by myself, as with all my notes.

My study of llama.cpp was frozen at commit hash 0dedb9ef7.

Brief summary

In GGML, a computation is described as a graph of tensors that is handed to a scheduler that executes nodes operators on the backends. The edges between the tensors are decided by the operators that produce and consume these tensors.

As a concrete example, here is the graph for $y = silu (W x + b)$ :

flowchart TD
    x["`**x**
    op=NONE`"]
    W["`**W**
    op=NONE`"]
    b["`**b**
    op=NONE`"]
    Wx(("`**Wx**
    op=MUL_MAT`"))
    Wxb(("`**Wxb**
    op=ADD`"))
    y(("`**y**
    op=SILU`"))

    W --> Wx
    x --> Wx
    Wx --> Wxb
    b --> Wxb
    Wxb --> y

Every node is a tensor. Rectangles are leaves (op=NONE); circles are computed tensors, annotated with the operator that produced them. An edge from T1 to T2 means T1 appears in T2->src[].

Matto

Recent Notes

Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures

Making Deep Learning Go Brrrr From First Principles

Defeating Nondeterminism in LLM Inference

GGML Library

Brief summary

Graph View