Data Pipeline Overview

A data pipeline typically has 5 stages:

  1. Collect:
    Data is acquired from data stores, data streams, and applications.
  2. Ingest:
    During ingestion, data is loaded into systems and organized within event queues.
  3. Store:
    Post ingestion, organized data is stored in data warehouses, data lakes, and data lakehouses. It is also stored in databases and other systems.
  4. Compute:
    Data undergoes aggregation, cleansing, and manipulation to conform to company standards. This includes format conversion, data compression, and partitioning. Both batch and stream processing are used, with stream processing also tapping into the ingestion phase directly for efficiency for many workloads.
  5. Consume:
    Processed data is made available for consumption through analytics, visualization, operational data stores, decision engines, user-facing applications, dashboards, data science, machine learning, business intelligence, and self-service analytics.

Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): https://bit.ly/496keA7

#systemdesign #coding #interviewtips