A neural network architecture built from (self-) attention layers plus position-wise feed-forward layers. Often used as the backbone of modern LLMs.