8. Authentication and authorization for backend engineers

Summary Transcript Chat

8. Authentication and authorization for backend engineers

Authentication and Authorization Overview

Introduction to Authentication

The speaker introduces authentication and authorization as everyday concepts, emphasizing their prevalence in daily interactions with various platforms.

Authentication is defined as a mechanism for assigning identity, answering the question "Who are you?" in specific contexts such as platforms or operating systems.

Understanding Authorization

Authorization is explained as determining what actions a user can perform within a given context, addressing the question "What can you do?"

The video aims to provide a high-level and technical understanding of both authentication and authorization processes.

Historical Context of Authentication

A historical overview begins with pre-industrial societies where implicit trust was the basis for authentication, relying on community recognition.

In early societies, respected individuals (e.g., village elders) would vouch for others, using handshakes to symbolize mutual agreement.

Evolution of Authentication Methods

As populations grew, implicit trust mechanisms became inadequate; thus began the search for explicit forms of authentication that did not rely on personal acquaintance.

The medieval period saw the introduction of seals as a method of authentication. These wax seals served as early cryptographic tokens representing identity.

Vulnerabilities and Advancements

Seals were prone to forgery, marking early instances of authentication bypass attacks which led to more sophisticated security measures.

The evolution continued with watermarks and encrypted codes during trade documentation, laying groundwork for modern cryptographic thinking.

Industrial Revolution Impact

With advancements in communication technology like the telegraph during the Industrial Revolution came an increased need for secure message validation.

History of Authentication Mechanisms

Early Forms of Authentication

Operators of early telegraphs used pre-agreed pass phrases, akin to static passwords, marking an initial form of shared secrets.

The concept evolved from physical tokens (like wax seals) to mental constructs or written/verbal communication for verification.

Transition to Digital Authentication

The evolution continued into computational architecture, with mainframes in the mid-20th century introducing digital authentication.

In 1961, MIT's Project MAC developed compatible time-sharing systems (CTSS), which introduced passwords for multi-user systems.

Vulnerabilities and Innovations

A critical vulnerability arose when a password file was printed in plain text, highlighting the need for secure password storage mechanisms.

This incident led to the philosophy that passwords should not be stored in plain text, paving the way for innovations like hashing.

Hashing and Cryptographic Advances

Hashing algorithms emerged as a solution for securely storing passwords by transforming them into irreversible fixed-length representations.

These algorithms ensure consistent output regardless of input length, aligning authentication with information security principles: confidentiality, integrity, and availability.

Asymmetric Cryptography Development

The 1970s saw significant cryptographic research advancements due to figures like Whitfield Diffie and Martin Hellman who introduced asymmetric cryptography.

Their work enabled secure key exchanges over untrusted mediums and became foundational for modern authentication protocols based on public key infrastructure (PKI).

Rise of Multi-Factor Authentication (MFA)

By the 1990s, as internet usage grew, simple username/password systems proved inadequate against emerging threats like brute force attacks.

MFA combined multiple principles—something you know (password), something you have (smart card), and something you are (biometric data)—to enhance security.

Authentication Evolution and Future Trends

The Challenges of Biometric Systems

Biometric systems utilize pattern recognition algorithms and statistical models to identify users through unique physical traits, but they face challenges such as false positives, false negatives, and template security issues.

Despite their potential, biometric security is not a one-step solution for the emerging problems in authentication.

The Rise of Advanced Authentication Frameworks

The 21st century has seen the emergence of cloud computing, mobile devices, and API-based architectures that necessitate more advanced authentication frameworks beyond traditional methods.

New authentication components have emerged due to these demands; notable examples include OAuth 2.0 (O2), JSON Web Tokens (JWT), and zero trust architecture.

Passwordless Authentication Innovations

Passwordless systems like web authentication eliminate passwords by relying on public/private key pairs stored in hardware devices.

Decentralized identity using blockchain technology is being explored as a promising future candidate for secure authentication.

Behavioral Biometrics and Post-Quantum Cryptography

Behavioral biometrics is gaining traction as an innovative approach to user identification.

Post-quantum cryptography aims to develop cryptographic techniques that remain secure against the capabilities of quantum computers, which threaten current algorithms like RSA.

Historical Context of Authentication Techniques

Understanding the historical context of authentication helps frame current discussions about its technical aspects.

Key Components in Authentication

Sessions, JWTs, and Cookies

Before diving into specific techniques, it's essential to introduce three critical components: sessions, JWTs (JSON Web Tokens), and cookies.

Statelessness of HTTP Protocol

Transitioning from Static to Dynamic Content

Initially designed as a stateless protocol for isolated interactions between clients and servers, HTTP did not retain memory of past requests.

As web content evolved into dynamic interactions requiring continuity—like e-commerce sites needing cart memory—the limitations of statelessness became apparent.

Emergence of Stateful Interactions

Introduction of Sessions

To address these needs for continuity in user experience across different pages or actions on websites, stateful interactions were introduced through sessions.

Session Management and Evolution

Understanding Session Creation

When a user logs in, the server creates a unique session ID and stores it with relevant user data, such as role and cart items.

This information is stored in a persistent store, which can be either a database or an in-memory store like Redis.

Session ID Transmission

The session ID is sent to the client (browser) as a cookie, allowing the server to recognize subsequent requests from that user.

Each request made by the client includes this cookie, enabling the server to retrieve user data from the persistent store.

Session Expiration and Renewal

Sessions have expiration dates; for example, if set to 15 minutes, after this period, users must log in again for a new session ID.

Initially, sessions were file-based but faced scalability issues as user numbers grew.

Transition to Database-backed Sessions

To handle larger user bases efficiently, servers transitioned to database-backed sessions for faster lookups and persistence across restarts.

Eventually, distributed architectures emerged where session storage moved to distributed systems like Redis for improved speed.

The Rise of JWT (JSON Web Tokens)

Emergence of Stateless Systems

By the mid-2000s, web applications evolved into globally distributed systems facing challenges with stateful systems due to high memory costs for maintaining session data.

Challenges with Stateful Systems

Synchronizing session data across geographically dispersed servers introduced latency and consistency issues during authentication processes.

Introduction of JWT

Developers sought solutions that offloaded state from servers while ensuring security; thus JWT was created as a stateless mechanism for transferring claims between parties.

Structure of JWT

JWT tokens are self-contained; they include essential user data (like IDs and roles), cryptographic signatures encoded in base64 format.

Components of JWT

A typical JWT consists of three parts:

Header: Contains metadata about the signing algorithm used during token creation.

Understanding JWT: Structure and Benefits

JWT Structure

The JSON Web Token (JWT) consists of three parts: the header, payload, and signature. The header includes the signing algorithm, while the payload contains user data.

The "sub" field in the payload typically stores the user's ID, which can originate from various contexts such as a database or an authentication provider.

The "iat" field indicates when the JWT was issued (issued at). Additional optional fields can store user information like name and role (e.g., admin, member).

The signature verifies that the token is legitimate and has not been tampered with. This is done using a secret key known only to the issuer.

If any changes are made to the JWT after issuance, validation will fail due to discrepancies with the secret key.

Advantages of Using JWT

Statelessness eliminates server-side storage costs for session information; each request independently verifies user identity without maintaining session state.

Scalability allows multiple servers in a microservice architecture to authenticate users using shared secret keys without needing centralized session management.

Portability enables JWTs to be easily passed between systems or stored in limited space environments like cookies or local storage due to their lightweight nature.

Challenges Associated with JWT

One major challenge is token impersonation; if someone gains access to a valid JWT, they can act on behalf of that user until it expires since there’s no server-side tracking.

Revocation issues arise because once issued, tokens cannot be invalidated until they expire unless all users are forced to log in again by changing the secret key.

Hybrid Approach for Improved Security

A hybrid approach combines statelessness with statefulness. After verifying a JWT's validity through its secret key, additional checks against a blacklist of revoked tokens are performed.

In this workflow, upon login, users receive a JWT which they send with subsequent requests. Verification occurs without additional database lookups unless necessary for blacklisting purposes.

This structured overview provides insights into how JSON Web Tokens function within authentication workflows while highlighting both their advantages and challenges.

Authentication Strategies and Their Implications

Temporary User Blocking and Blacklisting

The use of temporary blocking for users can be implemented through database calls or in-memory solutions like Redis, allowing access revocation in cases of account hacking or malicious activity.

Stateless vs. Stateful Authentication

A critical question arises regarding the use of JWT (JSON Web Tokens) for statelessness; if persistent storage is needed to validate JWTs, why not adopt a stateful approach from the beginning?

Industry advice often suggests using an authentication provider (e.g., Auth0, Clerk), which alleviates concerns about technology choices and security measures related to authentication systems.

Advantages of Using External Authentication Providers

Relying on external providers allows developers to avoid the complexities involved in creating secure authentication systems, including algorithms, hashing, and salting.

For medium to large systems, utilizing an external auth provider is recommended unless one has significant confidence in their own authentication workflows.

Understanding Cookies in Authentication

Cookies serve as a method for storing information on a user's browser from the server side, enabling servers to maintain user data securely.

Cookies are accessible only by the server that set them, providing a security feature that prevents cross-server cookie visibility.

Cookie Workflow During Authentication

When a user authenticates successfully with credentials, the server sets a cookie containing an authorization token (like JWT or session ID).

This cookie is sent back to the server with each subsequent request from the client’s browser, allowing for user validation and authorization processes.

Types of Authentication Methods

The discussion covers two major types: stateful and stateless authentication. Other methods include API key-based and OAuth 2.0 based authentications.

Stateful Authentication Overview

In stateful authentication, after sending credentials to the server for verification, if successful, a session ID is generated by the server for future requests.

Understanding Stateful vs Stateless Authentication

Overview of Stateful Authentication

The process begins with a session ID and user data being bundled and stored in Redis, chosen for its fast read access compared to traditional databases.

The server sends the session ID back to the client in an HTTP-only cookie, ensuring that JavaScript cannot access it.

Subsequent requests from the client include this cookie, allowing the server to check the session ID's existence in Redis for user identification and authorization.

The session ID can be any cryptographic random string or JWT token, depending on implementation specifics.

This workflow illustrates how stateful authentication operates, relying on persistent storage for user information.

Transitioning to Stateless Authentication

In stateless authentication, upon login with username and password, the server verifies credentials and generates a signed JWT token using a secret key.

The JWT contains user information (e.g., user ID, role), which is sent back to the client for future requests via an Authorization header.

The server extracts and verifies this token using its secret key; successful verification allows API access while failure results in an unauthorized error response.

Stateless authentication does not require looking up a persistent store since all necessary information is contained within the JWT itself.

This method is termed "stateless" because it does not maintain session state on the server side.

Pros and Cons of Stateful vs Stateless Authentication

Advantages of Stateful Authentication

Centralized control over sessions provides real-time insights into active sessions, enabling easy revocation of access when needed.

It suits applications with high traffic demands and strict session management requirements due to its secure nature.

Challenges of Stateful Authentication

Limited scalability issues arise as operational complexity increases with distributed systems across multiple servers leading to latency challenges.

Benefits of Stateless Authentication

Offers scalability without dependency on a session store; ideal for distributed architectures where cookies may not be applicable.

Drawbacks of Stateless Authentication

Token revocation poses significant challenges; once issued, a JWT remains valid until expiration unless drastic measures like changing secret keys are taken.

Conclusion: Hybrid Approaches

While both methods have their strengths and weaknesses, understanding when to use each type can lead to more effective authentication strategies tailored to specific application needs.

Authentication Methods in Web Applications

Hybrid Authentication Approaches

A hybrid authentication approach allows for stateful authentication in web apps while utilizing stateless authentication for mobile apps and third-party integrations, balancing scalability and simplicity.

API Key-Based Authentication Overview

API key-based authentication serves a distinct set of use cases compared to stateful and stateless methods, providing a unique solution for programmatic access.

How API Keys Work

Users can generate an API key through the platform's UI, receiving a cryptographically secure random string that grants access to the server associated with that UI.

Practical Example: ChatGPT Interface

The ChatGPT interface exemplifies how users interact with multiple servers behind the scenes to receive responses, highlighting the complexity of backend operations supporting user-friendly UIs.

Use Cases for API Keys

API keys are beneficial for users who require programmatic access to models like GPT without needing a traditional UI; they allow integration into custom applications or services.

Access Control with API Keys

API keys enable controlled access to servers based on permissions and expiration dates, allowing developers to manage how different users interact with their services securely.

Advantages of Using API Keys

Generating an API key is straightforward—users simply click a button in the UI. This ease of generation makes it accessible for various applications.

Machine-to-Machine Communication

Unlike client-to-server interactions involving human input, machine-to-machine communication occurs entirely programmatically, facilitating seamless data exchange between servers without direct user involvement.

Example of Machine-to-Machine Interaction

In scenarios where one server requests capabilities from another (e.g., summarizing text using ChatGPT), this interaction exemplifies machine-to-machine communication powered by APIs.

Identity Verification via API Keys

When making requests using an API key, the server verifies identity and authorizes actions based on predefined plans and quotas associated with that key.

Machine to Machine Interaction and API Keys

Understanding Machine to Machine Interactions

Machine to machine interactions utilize API keys for communication between servers, allowing seamless data exchange without human intervention.

Unlike user interfaces that require visual triggers, machine interactions rely on programmatic requests facilitated by API keys, enhancing efficiency in automated processes.

Authentication Methods

Traditional authentication methods involve manual input of usernames and passwords, leading to complex workflows requiring human interaction.

In contrast, machine interactions simplify this process by using a secret key stored securely, which is sent with every request for identification.

The Evolution of Authentication: OAuth 2.0

The Need for OAuth 2.0

As users create more accounts across platforms, managing multiple credentials becomes cumbersome and poses security risks due to password reuse.

Early internet practices often involved weak passwords (e.g., "12345"), making accounts vulnerable during breaches.

Delegation in Access Management

The concept of delegation emerged as applications required access to resources from other platforms (e.g., travel apps needing Gmail access).

This led to the realization that sharing passwords was insecure; it granted full access without permission control or easy revocation options.

The Birth of OAuth

Addressing the Delegation Problem

The delegation problem highlights the need for secure resource sharing between platforms while maintaining user control over permissions.

In 2007, OAuth was developed as a revolutionary protocol enabling users to grant limited access without sharing their passwords, significantly improving security and usability in digital interactions.

Understanding OAuth: From 1.0 to 2.0

The Delegation Problem and Token Sharing

The delegation problem involves sharing access without compromising security, leading to the development of token sharing as a solution.

Unlike passwords that grant full access, tokens provide specific permissions, allowing limited access to certain parts of an account.

For example, a token can allow reading contacts without enabling deletion or modification, enhancing security.

Components of OAuth 1.0

Key components include:

Resource Owner: The user who owns the data (e.g., you).

Client: The application requesting access (e.g., Facebook).

Resource Server: The server hosting the resource (e.g., Google).

Authorization Server: Issues tokens after authenticating users.

OAuth 1.0 Flow

The flow begins with the client redirecting the user to the authorization server for authentication.

After granting permission, the authorization server sends a token back to the client (Facebook).

The client uses this token to access resources on behalf of the user without needing their password.

Limitations of OAuth 1.0 and Introduction of OAuth 2.0

While revolutionary, OAuth 1.0 was complex for developers and relied on error-prone cryptographic signatures.

OAuth 2.0 simplified implementation by introducing bearer tokens and allowed developers to choose flows based on app types.

Different Flows in OAuth 2.0

Various flows were introduced in OAuth 2.0:

Authorization Code Flow for server-side apps.

Implicit Flow for browser-based apps (now discouraged due to security risks).

Client Credentials Flow for machine-to-machine communication.

Device Code Flow for devices with limited input capabilities like Smart TVs.

Authentication vs Authorization in OAuth

While OAuth effectively handles authorization through delegation, it does not address authentication—defining user identity versus permissions within platforms.

Authentication identifies who you are; authorization determines what actions you can perform based on your identity and permissions within a platform.

OpenID Connect: Enhancing Authentication

Introduction to OpenID Connect

OpenID Connect (OIDC) was developed around 2014 to address authentication gaps in OAuth 2.0's authorization workflow.

OIDC introduced the concept of an ID token, which is typically a JSON Web Token (JWT), enhancing security and user identity verification.

Understanding ID Tokens

The ID token contains essential information such as user ID, issuance time, and the authority that issued the token.

It also includes user details like name and email, allowing platforms to authenticate users without managing their own credentials.

Practical Applications of OIDC

Major platforms now offer "Sign in with Google" or similar options using OIDC to retrieve user identities seamlessly.

When signing in with Google, the platform retrieves necessary profile information from Google’s servers instead of maintaining its own authentication system.

Workflow of OpenID Connect

The client application redirects users to Google's authorization server for login and permission granting.

After successful login, the authorization server sends back an authorization code and optionally an ID token to the client application.

Accessing Resources via Tokens

The client exchanges the authorization code for an access token from the resource server on behalf of the user.

With this access token, applications can perform actions like retrieving notes from Google Keep based on granted permissions.

Security Implications of OIDC and OAuth 2.0

Together, OAuth 2.0 and OpenID Connect act as security mechanisms ensuring that users or platforms only access resources they have permission for.

These technologies have transformed online interactions by reducing password sharing chaos into a secure interconnected system.

Choosing Authentication Methods

A discussion on when to use different types of authentication methods relevant for backend engineers is introduced.

Stateful authentication is recommended for web app workflows involving session IDs or JWT tokens stored persistently.

Authentication and Authorization in Backend Engineering

Understanding Authentication

Stateful vs. Stateless Authentication: Stateful authentication is typically used for systems that require session data stored on the server, while stateless authentication is ideal for APIs or scalable systems with distributed servers where tokens carry user information.

Types of Authentication: The four main types of authentication discussed include:

Stateless authentication

Stateful authentication

OAuth (for third-party integrations)

API key-based authentication (for server-to-server communication).

Common Usage: In practice, developers often use stateful and stateless authentication when building APIs, as these methods are most applicable to various scenarios.

Introduction to Authorization

Defining Authorization: While not as extensive as authentication, authorization focuses on permissions—what a user can do within a system—contrasting with authentication, which identifies who the user is.

Need for Authorization: The concept arose from the need to manage what authenticated users can do within a platform. For example, after logging into a note-taking application, users should have specific capabilities based on their roles.

Use Case Example

User Actions in Note-Taking Platform: Once authenticated, users can create, delete, or update notes. However, there are additional considerations regarding how deleted notes are handled (e.g., moving them to a "dead zone" instead of permanent deletion).

Admin Permissions Requirement: As the creator of the platform, an administrator needs special permissions that regular users do not have. This includes managing notes beyond standard user capabilities.

Security Concerns in Authorization

Risks of Simple Permission Strings: Using simple strings for granting admin access poses security risks; if intercepted by malicious actors, they could exploit these permissions to harm the platform.

Complexity in Managing Access Levels: Granting special permissions to multiple users complicates management and increases potential security flaws due to numerous access strings being shared or created.

Role-Based Access Control (RBAC)

Concept of RBAC: Authorization techniques like Role-Based Access Control (RBAC) emerged from the need for structured permission management. Not all users have equal access; different roles come with varying capabilities.

Implementation of Roles in Platforms: Common roles include user roles and admin roles. Each role is assigned specific permissions tailored to their responsibilities within the system.

By understanding these concepts thoroughly—authentication's identification function versus authorization's permission management—backend engineers can design more secure and efficient systems that cater effectively to diverse user needs.

Role-Based Access Control in Backend Systems

Understanding User Roles and Permissions

A user role can be assigned different permissions: read-only for users, read and write for moderators, and full access (read/write) for admins. Custom roles with specific permissions can also be created.

Permissions can be granular; for example, only admins may have access to certain resources like "The Dead Zone" notes while other roles do not.

Workflow of Role Assignment

When a user registers or signs up, the server assigns them a role (e.g., user or admin). This assignment is crucial for determining access rights in subsequent requests.

The user's identity is verified through tokens or session IDs during authentication. The server checks the assigned role either via token information or database lookup.

Middleware and Access Control

Upon initial request handling, the server attaches the user's role information to facilitate decision-making in subsequent middleware processes regarding resource access.

If a user with an admin role requests access to restricted notes, they are granted permission. Conversely, if a regular user attempts this action, they receive a 403 Forbidden error indicating insufficient permissions.

Error Handling in Authentication Workflows

Proper error messaging is essential during authentication workflows. Specific messages can inadvertently aid attackers by revealing valid usernames or account statuses.

For instance, messages like "user not found" or "incorrect password" provide clues that could help an attacker refine their approach.

Best Practices for Security

Always send generic error messages during authentication failures to prevent attackers from gaining insights into valid accounts or credentials.

Avoid friendly error messages related to authentication; instead use vague terms like "authentication failed" regardless of the specific issue encountered.

Timing Attacks Awareness

Be aware of timing attacks where response times vary based on whether an account exists or if it’s locked. Attackers may exploit these differences to infer valid usernames.

In typical workflows, servers first verify username existence before checking account status (locked/suspended), which could lead to exploitable timing discrepancies.

This structured overview captures key concepts around role-based access control and security practices within backend systems as discussed in the transcript.

Understanding Password Security and Timing Attacks

Password Storage Mechanism

When users sign up, their passwords are hashed into a cryptographically secure string before being stored in the database, ensuring that plain text values cannot be retrieved by the server.

During login, the provided password is hashed using the same algorithm and key as when it was stored. The server then compares this newly hashed value with the previously stored hash to verify correctness.

If the hashes match, access is granted; if not, an incorrect password response is generated. This process highlights how servers authenticate user credentials securely.

Username Validation and Response Time

If an invalid username is entered, the system quickly terminates authentication with a "user not found" message, resulting in faster response times compared to checking an incorrect password for a valid username.

The additional hashing step required for password verification introduces delays in responses when passwords are incorrect. This timing difference can reveal information about which part of authentication failed.

Exploiting Timing Differences

Attackers can exploit these timing differences to determine whether a username or password was invalid based on response times. This knowledge can inform their strategies for brute force or dictionary attacks.

Mitigating Timing Attacks

Constant Time Operations

To defend against timing attacks, backend engineers should implement constant time operations for comparing password hashes. These functions ensure execution time remains consistent regardless of input similarity.

Simulated Response Delays

Another method involves simulating a delay in responses (e.g., using setTimeout in Node.js or time.Sleep in Go). This approach prevents attackers from measuring timing differences between failed username and password attempts.

By introducing artificial delays even when usernames do not match, systems can obscure which part of authentication has failed, enhancing overall security against potential attacks.

Playlists: Backend from first principles

Video description

In this video we understand what is authentication and authorization, where do we use it, why do we use it and the importance of it. Join the Discord community: https://discord.gg/NXuybNcvVH Some resources to explore more on your own https://portswigger.net/web-security/access-control https://portswigger.net/web-security/authentication https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html https://cheatsheetseries.owasp.org/cheatsheets/Authorization_Cheat_Sheet.html https://jwt.io/ https://fusionauth.io/blog/category/education/ https://www.pingidentity.com/en/resources/identity-fundamentals/authorization/authorization-methods.html https://en.wikipedia.org/wiki/Hash_function https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html https://cheatsheetseries.owasp.org/index.html is a VERY good resource to master everything related to security overall. #backend #nodejs #golang #softwareengineering Nerd out about the history of technologies here https://www.fascinatingtechhistory.xyz/