Purpose
My notes on the Georgi Gerganov Machine Learning (GGML) library, which is the bedrock of
llama.cpp. The focus of this study is on the components necessary for inference.
AI Use Note
While my learning was assisted by Opus 4.7 (through Claude Code), my writing was done entirely by myself, as with all my notes.
My study of llama.cpp was frozen at commit hash 0dedb9ef7.
Brief summary
In GGML, a computation is described as a graph of tensors that is handed to a scheduler that executes nodes operators on the backends. The edges between the tensors are decided by the operators that produce and consume these tensors.
As a concrete example, here is the graph for :
flowchart TD x["`**x** op=NONE`"] W["`**W** op=NONE`"] b["`**b** op=NONE`"] Wx(("`**Wx** op=MUL_MAT`")) Wxb(("`**Wxb** op=ADD`")) y(("`**y** op=SILU`")) W --> Wx x --> Wx Wx --> Wxb b --> Wxb Wxb --> y
Every node is a tensor. Rectangles are leaves (op=NONE); circles are computed tensors, annotated with the operator that produced them. An edge from T1 to T2 means T1 appears in T2->src[].