IT & Technology · AI/MLAdvanced

Transformer Architecture

USUK

A deep learning model architecture using self-attention mechanisms, foundational to modern NLP and LLMs.

Also: transformer model

Definition

The Transformer is a deep learning model architecture introduced in the 2017 paper 'Attention Is All You Need.' Unlike recurrent neural networks, Transformers process entire sequences in parallel using self-attention mechanisms that capture relationships between all tokens regardless of distance. The Transformer architecture is the foundation of virtually all modern large language models (BERT, GPT, T5, LLaMA). It is also widely used in computer vision (Vision Transformer) and multimodal models.

Example

“ChatGPT, Claude, and Gemini are all based on the Transformer architecture, which enables them to understand long-range context and relationships between distant parts of text.”

Synonyms

attention model
self-attention network
transformer model

Antonyms / Opposites

recurrent neural network
LSTM
RNN

Images

CC-licensed · free to use

Video

deep-learning
nlp
attention-mechanism
llm

Dictionary Entry

Back to IT & Technology