01-Unity Catalog Introduction
Introduction to Unity Catalog
Overview of Unity Catalog
- Unity Catalog is introduced as a unified governance solution for data and AI assets on lake house architectures, available across major cloud platforms like AWS, Azure, and Google Cloud.
- It is a premium offering from Databricks that provides various features aimed at enhancing data management.
Key Functionalities
- Centralized metadata layer allows sharing of databases, tables, and views across multiple Databricks workspaces.
- Standard SQL syntax can be used to grant permissions on databases, tables, and views; these permissions are applicable across all workspaces.
- User-level audit logs are captured by Unity Catalog, which also offers features such as data discovery and lineage tracking.
Architecture of Unity Catalog
System Components
- The architecture includes centralized management of identity and metastore where all Databricks identities (users, groups, service principals) reside in the Unity Catalog.
- Metadata for databases, tables, and views is stored in the Unity Catalog metastore.
Permission Management
- Once connected to a workspace, Databricks uses Unity Catalog for authorization; access to data objects is contingent upon defined permissions within the catalog.
Comparison: With vs. Without Unity Catalog
Decentralized vs. Centralized Management
- Without Unity Catalog: Access control and user management are decentralized; each workspace manages its own metadata without sharing capabilities.
- With Unity Catalog: Access control and user management become centralized; multiple workspaces can connect to a single catalog for streamlined governance.
Unity Catalog Object Model
Hierarchical Structure
- The object model follows a hierarchical naming standard with the metastore at the root level containing multiple catalogs.
- Each catalog contains schemas or databases that can have multiple tables and views stored in Azure Data Lake Storage (ADLS).
Use Cases for Organization
- Different catalogs can be created based on environments (development, staging, production).
- Business units can also dictate separate catalogs tailored to their specific needs.
Enabling Unity Catalog
Step-by-Step Process
- To enable Unity Catalog:
- Create a Databricks workspace with Premium Edition since it’s restricted to this version.
- Collaborate with an Azure Active Directory Global Administrator for initial admin access in the console.