Training Data
The labeled dataset used to train a machine learning model to recognize patterns and make predictions.
Also: training set
Definition
Training data is the dataset used to train a machine learning model by exposing it to examples from which it can learn patterns and relationships. In supervised learning, training data consists of input-output pairs (features and labels). The quality, quantity, and diversity of training data directly impact model performance. Common challenges include insufficient data, class imbalance, label noise, and data bias — which can lead models to perpetuate or amplify existing societal biases.
Example
“To train an image classifier to identify cats, engineers provide 100,000 images labeled 'cat' or 'not cat,' from which the model learns distinguishing visual features.”
Synonyms
- labeled dataset
- training dataset
- training corpus
- ground truth data
Antonyms / Opposites
- test data
- unlabeled data
- validation data
Images
CC-licensed · free to useVideo
Related Terms
- machine-learning
- overfitting
- data-augmentation
- validation-data
