Section: IT & Technology · AI/MLDifficulty: Advanced

Transformer Architecture

USUK

A deep learning model architecture using self-attention mechanisms, foundational to modern NLP and LLMs.

Also: transformer model

Definition

The Transformer is a deep learning model architecture introduced in the 2017 paper 'Attention Is All You Need.' Unlike recurrent neural networks, Transformers process entire sequences in parallel using self-attention mechanisms that capture relationships between all tokens regardless of distance. The Transformer architecture is the foundation of virtually all modern large language models (BERT, GPT, T5, LLaMA). It is also widely used in computer vision (Vision Transformer) and multimodal models.

Example

ChatGPT, Claude, and Gemini are all based on the Transformer architecture, which enables them to understand long-range context and relationships between distant parts of text.

Synonyms

  • attention model
  • self-attention network
  • transformer model

Antonyms / Opposites

  • recurrent neural network
  • LSTM
  • RNN

Images

CC-licensed · free to use
More on Wikimedia
Loading images…

Video

  • deep-learning
  • nlp
  • attention-mechanism
  • llm

Dictionary Entry

Back to IT & Technology