A (usually very large) model trained to model or generate sequences of tokens.

Most modern LLMs are transformer-based and rely heavily on (self-) attention, but “LLM” does not strictly imply a specific architecture (e.g. RNNs, state space models, diffusion-style text models).

See transformer-dataflow-types for a common end-to-end flow.