DP-900 Exam EP 18: Data Storage and Processing in Azure
Azure Data Fundamentals DP900 Certification Course
In this video, Sushant Sutish introduces the most common options for processing data in Azure, including Azure Databricks, Azure Data Factory, Azure Synapse Analytics, and Azure Data Lake.
Processing Data in Azure
- The most common options for processing data in Azure include:
- Azure Synapse Analytics: a generalized analytics service that can read data from many sources and process it using either Transact-SQL or Spark.
- Azure Databricks: an analytics platform optimized for Microsoft Azure cloud services that can process data held in many different types of storage.
- Azure HDInsight: a managed analytics service based on Apache Hadoop that uses a clustered model to store and analyze large amounts of data.
- Azure Data Factory: a scalable and programmable engine that can ingest large amounts of raw unorganized data from relational and non-relational systems and convert it into meaningful information.
- Data Lake Store: provides a file system that can store near limitless quantities of structured and unstructured data.
Understanding Azure Synapse Analytics
- Azure Synapse Analytics is an integrated analytics service composed of the following elements:
- Azure Synapse SQL Pool: a collection of servers running TSQL used by both Azure SQL Database and Microsoft SQL Server.
- Synapse Spark Pool: a cluster of servers running Apache Spark to process data using Python, Scala, SQL or C#.
- Synapse Pipelines: logical groupings of activities that together perform a task, such as transforming data from a source data set to a destination data set.
- Synapse Link: allows you to connect to Cosmos DB and perform near real-time analytics over operational data stored in a Cosmos DB database.
- Synapse Studio: a web user interface that enables data engineers to access all the Synapse Analytics tools and create SQL and Spark pools, define and run pipelines, and configure links to external data sources.