Section: IT & Technology · DatabasesDifficulty: Medium

Data Lake

USUK

A centralized repository storing vast amounts of raw data in its native format.

Definition

A data lake is a centralized repository that stores large volumes of structured, semi-structured, and unstructured data in its raw, native format until needed for analysis. Unlike data warehouses that require predefined schemas, data lakes use schema-on-read, enabling flexible exploration and supporting diverse workloads including machine learning, data science, and log analytics. Cloud implementations on AWS S3, Azure Data Lake Storage, and Google Cloud Storage use open formats like Parquet and Delta Lake for efficiency.

Example

All application logs, clickstream events, sensor readings, and transaction records flow into the company's data lake in S3, enabling data scientists to explore relationships across datasets without prior schema definition.

Synonyms

  • raw data store
  • data reservoir
  • unstructured data store
  • big data store

Antonyms / Opposites

  • data warehouse
  • structured database

Images

CC-licensed · free to use
More on Wikimedia
Loading images…

Video

  • data-warehouse
  • etl
  • nosql
  • cloud-migration

Dictionary Entry

Back to IT & Technology