Transformer-based LLMs are limited in the number tokens that they can ingest.
The sequence length defines the maximum number of tokens in the model’s input context.
The sequence length is also known as the context length or context window, since it is a finite-sized window of context that limits how much the model can see at once. People use these terms interchangeably.
In transformers, sequence length is limited due to the way attention is implemented. The memory overhead of attention scales quickly with context length.