01-Unity Catalog Introduction

01-Unity Catalog Introduction

Introduction to Unity Catalog

Overview of Unity Catalog

  • Unity Catalog is introduced as a unified governance solution for data and AI assets on lake house architectures, available across major cloud platforms like AWS, Azure, and Google Cloud.
  • It is a premium offering from Databricks that provides various features aimed at enhancing data management.

Key Functionalities

  • Centralized metadata layer allows sharing of databases, tables, and views across multiple Databricks workspaces.
  • Standard SQL syntax can be used to grant permissions on databases, tables, and views; these permissions are applicable across all workspaces.
  • User-level audit logs are captured by Unity Catalog, which also offers features such as data discovery and lineage tracking.

Architecture of Unity Catalog

System Components

  • The architecture includes centralized management of identity and metastore where all Databricks identities (users, groups, service principals) reside in the Unity Catalog.
  • Metadata for databases, tables, and views is stored in the Unity Catalog metastore.

Permission Management

  • Once connected to a workspace, Databricks uses Unity Catalog for authorization; access to data objects is contingent upon defined permissions within the catalog.

Comparison: With vs. Without Unity Catalog

Decentralized vs. Centralized Management

  • Without Unity Catalog: Access control and user management are decentralized; each workspace manages its own metadata without sharing capabilities.
  • With Unity Catalog: Access control and user management become centralized; multiple workspaces can connect to a single catalog for streamlined governance.

Unity Catalog Object Model

Hierarchical Structure

  • The object model follows a hierarchical naming standard with the metastore at the root level containing multiple catalogs.
  • Each catalog contains schemas or databases that can have multiple tables and views stored in Azure Data Lake Storage (ADLS).

Use Cases for Organization

  • Different catalogs can be created based on environments (development, staging, production).
  • Business units can also dictate separate catalogs tailored to their specific needs.

Enabling Unity Catalog

Step-by-Step Process

  • To enable Unity Catalog:
  • Create a Databricks workspace with Premium Edition since it’s restricted to this version.
  • Collaborate with an Azure Active Directory Global Administrator for initial admin access in the console.
Video description

This video will give you an introduction about Unity Catalog. More Videos on Unity Catalog 01. Unity Catalog Introduction - https://www.youtube.com/watch?v=yc5BHW149hs&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=1 02. Unity Catalog Configuration prerequisites - https://www.youtube.com/watch?v=FQnb1-kbbig&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=2 03. Creating Metastore - https://www.youtube.com/watch?v=vAab07QrLZk&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=3 04. Create catalog schema and tables - https://www.youtube.com/watch?v=Z13OHMWs6zc&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=4 05. Lineage - https://www.youtube.com/watch?v=fFYL_h-6II4&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=5 06. Delta Sharing - https://www.youtube.com/watch?v=1TxsOosaQ0k&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=6 07. Create External Location - https://www.youtube.com/watch?v=h-RJn0cfSOo&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=7 08. SCIM Configuration - https://www.youtube.com/watch?v=oLS1soW_gD8&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=8 09. Create compute cluster - https://www.youtube.com/watch?v=dQ5mV9v5kDY&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=9 10. Admin Roles in Unity Catalog - https://www.youtube.com/watch?v=KBXr1VfQgAQ&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=10 11. Managed Tables in Unity Catalog - https://www.youtube.com/watch?v=ZYnKrox83Lw&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=11 12. Lakehouse Federation - https://www.youtube.com/watch?v=bVmIlOxVbu4&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=13 13. Row Level Access Control - https://www.youtube.com/watch?v=Gc0SX3mj3YE&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=14 14. Column Masking - https://www.youtube.com/watch?v=PU1OVz5OKfo&list=PLY-V_O-O7h4fwcHcXgkR_zTLvddvE_GfC&index=15 Databricks Unity Catalog Azure Databricks Unity Catalog Unity Catalog Introduction What is Unity Catalog Azure Databricks