USENIX ATC '19 - Zanzibar: Google’s Consistent, Global Authorization System

USENIX ATC '19 - Zanzibar: Google’s Consistent, Global Authorization System

Introduction to Zanzibar Theorization System

In this section, Ramon Casillas introduces the Zanzibar theorization system and its importance in determining online user permissions for accessing digital objects.

Zanzibar's Role in Authorization Checks

  • Online privacy relies on determining whether users have permission to access digital objects.
  • Zanzibar is a system that performs authorization checks for hundreds of Google services.
  • It stores access control lists or permissions for various Google services like YouTube, Photos, Drive, and Google Cloud.

Storing Permissions and Performing Authorization Checks

  • Zanzibar serves two main purposes: storing access control lists or permissions and performing thorough authorization checks based on those stored permissions.
  • Examples include storing a permission that a video is public on YouTube or that a group manages a cloud project on Google Cloud Platform.
  • When a user tries to access a video or use a virtual machine, Zanzibar checks if they have the necessary permissions.

Notable Properties of Zanzibar

This section highlights the notable properties of the Zanzibar system, including consistency, flexibility, scalability, performance, and availability.

Consistency

  • Zanzibar needs to respect the causal ordering of updates to access control lists and object contents to ensure consistency.
  • This allows clients and users to reason about when permissions are present or not.

Flexibility

  • Zanzibar supports a wide variety of access control policies due to the diverse patterns across different client services.

Scalability

  • Many client services using Zanzibar are global with billions of users and manage billions of objects.
  • Zanzibar has been shown to scale to trillions of access control entries and millions of authorization checks per second.

Performance

  • Authorization checks are often critical for user interactions, and Zanzibar aims to be fast.
  • It achieves low tail latency (less than 10 milliseconds at the 95th percentile) to ensure efficient authorization checks.

Availability

  • Zanzibar needs to be highly available as the absence of an authorization system leads to a denial of service problem.
  • Client services must assume that the answer to an authorization check is "no" if the system is not available.

How Zanzibar Works

This section explains how Zanzibar works, including the creation of namespaces, defining relations between users and objects, and evaluating authorization checks.

Creation of Namespaces and Relations

  • Clients can create arbitrary namespaces for different types of objects (e.g., videos, photos, documents).
  • Within these namespaces, clients define relations between users and objects (e.g., owner, editor, viewer).

Relation-to-Pools Storage

  • Zanzibar stores information about relations in relation-to-pools tables.
  • A relation-to-pool is a three-tuple consisting of an object or relation and a user set.
  • Examples include stating that User A is a viewer of Video X or all users are viewers of Video Y.

Evaluating Authorization Checks

  • When performing authorization checks, Zanzibar reads stored relation-to-pools and evaluates the check based on the provided information.
  • The answer to an authorization check depends on whether a user appears in the access control list for a specific object or relation.

Indirection with User Sets

  • User sets in relation-to-pools can be more than just single users or all users; they can also represent sets with certain relations to specific objects.
  • This allows for indirection where Zanzibar expands user sets by accessing other namespaces to determine the relations of users in those sets.

Protecting Against the New Enemy Problem

This section discusses the importance of protecting against the new enemy problem, where changes to access control lists and object contents need to be applied in a specific order.

Importance of Order in Applying Changes

  • Clients care about the order in which Zanzibar applies changes to access control lists and object contents.
  • They want to protect against situations like when a user decides to remove another user's access to a shared document.

Conclusion

Zanzibar is a powerful theorization system that handles authorization checks for numerous Google services. It stores permissions, performs thorough authorization checks, and exhibits notable properties such as consistency, flexibility, scalability, performance, and availability. By understanding how Zanzibar works and addressing challenges like the new enemy problem, it enables efficient and secure access control for online users.

Zanzibar Consistency Protocol

This section introduces the Zanzibar consistency protocol, which is used to handle sensitive and confidential information in a cooperative manner with Google services.

Zanzibar Consistency Protocol

  • The Zanzibar consistency protocol ensures the protection of sensitive information by executing a consistency protocol in cooperation with its client services.
  • All client services are friends of Zanzibar and work together in a cooperative manner.
  • The protocol exploits linearizable commit timestamps provided by the underlying database system, Spanner, to mark the order of events in Zanzibar.
  • Changes to access control lists receive timestamps, allowing for proper evaluation and enforcement of access restrictions.

Exploiting Commit Timestamps

This section explains how Zanzibar utilizes commit timestamps provided by Spanner to enforce access control and protect against unauthorized content updates.

Exploiting Commit Timestamps

  • The key to the consistency protocol is leveraging commit timestamps provided by Spanner.
  • Access control list changes receive a timestamp (T0), which is returned to the client for potential future use.
  • Content updates are evaluated using a new timestamp (T1), ensuring that they occur after the access control list change.
  • When evaluating document access requests, Zanzibar uses the previously received timestamp (T1) to ensure freshness and prevent unauthorized access.

Document Access Control Workflow

This section describes how document access control works within Zanzibar, including checks for viewing documents and protecting against unauthorized content updates.

Document Access Control Workflow

  • When Bob tries to access a document (X), the document service sends an access check request for Bob's viewer status.
  • The document service includes the timestamp received from Zanzibar in the access check request.
  • Zanzibar evaluates the access check, ensuring that it is done at a time as fresh as T1 (the timestamp associated with the content update).
  • This protocol guarantees that Bob cannot access new content after being removed from the access control list.

Zanzibar System Overview

This section provides an overview of the Zanzibar system architecture and its components.

Zanzibar System Overview

  • The Zanzibar system is a large-scale system with clients and a well-defined API.
  • The API consists of two main parts: write and check writes for creating, modifying, or deleting access control entries, and read-only APIs for reading and expanding apples (access control entries).
  • Another API called watch provides clients with real-time streams of changes to their apples.
  • Requests arrive at co-servers within organizing clusters, which can distribute work among servers in the cluster.
  • Data is stored in Spanner, a global database system. Each namespace has its own database for configurations and relations.

Engineering Challenges

This section highlights some of the engineering challenges faced by Zanzibar in satisfying all requirements of such a system.

Engineering Challenges

  • Choosing evaluation timestamps allows flexibility while meeting client requirements for freshness.
  • Hot spot mitigation techniques like caching and deduplication help manage high traffic areas.
  • Request hedging involves sending multiple copies of requests to different servers to reduce tail-end latency.
  • Fine-grained performance isolation protects against misbehaving clients.
  • A specialized indexing system called Leopard optimizes operations on large or nested sets efficiently.

Efficient Operations with Leopard

This section explains how Zanzibar utilizes the Leopard indexing system to optimize operations on large or deeply nested sets.

Efficient Operations with Leopard

  • Leopard is a specialized indexing system used by Zanzibar.
  • It runs in memory and can perform set operations like unions and intersections on very large or deeply nested sets in microseconds.

Overview of the System

The speaker discusses the architecture and performance of the authorization system, which consists of over 10,000 servers in several dozen clusters worldwide. They explain the two types of check queries - safe and recent - and how they impact performance.

Performance Notes

  • The system handles a typical week with millions of check queries per second.
  • Safe requests are more common and have a time stamp that is at least 10 seconds old, allowing for faster data retrieval from nearby servers.
  • Recent requests have a time stamp less than 10 seconds old, requiring access to faraway servers and resulting in slower response times.
  • The majority of requests are safe, with about 4 million QPS compared to only about 100,000 for recent queries.

Latency and Availability

The speaker discusses latency and availability as key goals for the authorization system.

  • Latency is measured in percentiles for check safe operations.
  • At the 95th percentile, check responses are served in less than 10 milliseconds.
  • At the 99.9th percentile, check responses are served in less than 100 milliseconds.
  • Availability has been maintained at over 99.999% for the past three years.
  • There is a slight dip in availability around New Year's every year but still remains well above 99.99%.

Summary of Zanzibar Authorization System

The speaker summarizes the features and capabilities of Zanzibar as a unified authorization system for Google services.

  • Robust authority authorization checks are crucial for preserving online privacy.
  • Zanzibar respects causal order of user actions and supports various access control policies.
  • It offers low latency, high availability, and scales to handle trillions of access control lists and millions of checks per second.
  • Zanzibar is used by hundreds of services and billions of people.

Q&A Session

The speaker opens the floor for questions from the audience.

  • Questions are asked about the semantic knowledge of relations in Zanzibar, potential integration with key transparency work, and the global nature of the system.
  • The speaker provides answers based on their understanding and expertise.

Overall, this transcript provides an overview of the architecture, performance, latency, availability, and features of the Zanzibar authorization system. It highlights the importance of robust authority authorization checks for privacy preservation online.

Video description

Ramon Caceres, Ruoming Pang, Mike Burrows, Zhifeng Chen, Pratik Dave, Nathan Germer, Alexander Golynski, Kevin Graney, and Nina Kang, Google; Lea Kissner, Humu, Inc.; Jeffrey L. Korn, Google; Abhishek Parmar, Carbon, Inc.; Christopher D. Richards and Mengzhi Wang, Google Determining whether online users are authorized to access digital objects is central to preserving privacy. This paper presents the design, implementation, and deployment of Zanzibar, a global system for storing and evaluating access control lists. Zanzibar provides a uniform data model and configuration language for expressing a wide range of access control policies from hundreds of client services at Google, including Calendar, Cloud, Drive, Maps, Photos, and YouTube. Its authorization decisions respect causal ordering of user actions and thus provide external consistency amid changes to access control lists and object contents. Zanzibar scales to trillions of access control lists and millions of authorization requests per second to support services used by billions of people. It has maintained 95th-percentile latency of less than 10 milliseconds and availability of greater than 99.999% over 3 years of production use. View the full USENIX ATC '19 program at https://www.usenix.org/conference/atc19/technical-sessions