Section: IT & Technology · AI/MLDifficulty: Easy

Training Data

USUK

The labeled dataset used to train a machine learning model to recognize patterns and make predictions.

Also: training set

Definition

Training data is the dataset used to train a machine learning model by exposing it to examples from which it can learn patterns and relationships. In supervised learning, training data consists of input-output pairs (features and labels). The quality, quantity, and diversity of training data directly impact model performance. Common challenges include insufficient data, class imbalance, label noise, and data bias — which can lead models to perpetuate or amplify existing societal biases.

Example

To train an image classifier to identify cats, engineers provide 100,000 images labeled 'cat' or 'not cat,' from which the model learns distinguishing visual features.

Synonyms

  • labeled dataset
  • training dataset
  • training corpus
  • ground truth data

Antonyms / Opposites

  • test data
  • unlabeled data
  • validation data

Images

CC-licensed · free to use
More on Wikimedia
Loading images…

Video

  • machine-learning
  • overfitting
  • data-augmentation
  • validation-data

Dictionary Entry

Back to IT & Technology