Data Lake
A centralized repository storing vast amounts of raw data in its native format.
Definition
A data lake is a centralized repository that stores large volumes of structured, semi-structured, and unstructured data in its raw, native format until needed for analysis. Unlike data warehouses that require predefined schemas, data lakes use schema-on-read, enabling flexible exploration and supporting diverse workloads including machine learning, data science, and log analytics. Cloud implementations on AWS S3, Azure Data Lake Storage, and Google Cloud Storage use open formats like Parquet and Delta Lake for efficiency.
Example
“All application logs, clickstream events, sensor readings, and transaction records flow into the company's data lake in S3, enabling data scientists to explore relationships across datasets without prior schema definition.”
Synonyms
- raw data store
- data reservoir
- unstructured data store
- big data store
Antonyms / Opposites
- data warehouse
- structured database
Images
CC-licensed · free to useVideo
Related Terms
- data-warehouse
- etl
- nosql
- cloud-migration
