An operation that computes a weighted combination of values, where the weights are derived from query-key similarity.

Used in many sequence models; transformer popularized large-scale self-attention.