Transformer Architecture
A deep learning model architecture using self-attention mechanisms, foundational to modern NLP and LLMs.
Also: transformer model
Definition
The Transformer is a deep learning model architecture introduced in the 2017 paper 'Attention Is All You Need.' Unlike recurrent neural networks, Transformers process entire sequences in parallel using self-attention mechanisms that capture relationships between all tokens regardless of distance. The Transformer architecture is the foundation of virtually all modern large language models (BERT, GPT, T5, LLaMA). It is also widely used in computer vision (Vision Transformer) and multimodal models.
Example
“ChatGPT, Claude, and Gemini are all based on the Transformer architecture, which enables them to understand long-range context and relationships between distant parts of text.”
Synonyms
- attention model
- self-attention network
- transformer model
Antonyms / Opposites
- recurrent neural network
- LSTM
- RNN
Images
CC-licensed · free to useVideo
Related Terms
- deep-learning
- nlp
- attention-mechanism
- llm
