5. Understanding HTTP for backend engineers, where it all starts

5. Understanding HTTP for backend engineers, where it all starts

Understanding the HTTP Protocol

Overview of Backend and HTTP

  • The backend is complex, and discussing every component would be overwhelming; focus will be on widely used topics (90% of codebases).
  • HTTP protocol is crucial for communication between browsers and servers, facilitating data sending and receiving.

Core Concepts of HTTP

Statelessness

  • Statelessness means no memory of past interactions; each request contains all necessary information (headers, URLs, methods).
  • Each request must include authentication tokens or session info to identify users, as the server does not retain previous requests.

Benefits of Stateless Model

  • Simplicity: Reduces server complexity by eliminating the need to store session information.
  • Scalability: Easier to distribute requests across multiple servers without tracking sessions; server crashes do not affect client interactions.

Client-Server Model in HTTP

  • In an HTTP flow, clients (browsers/apps) initiate communication by sending requests with required information (URL, headers).
  • Servers host resources and respond to incoming requests with appropriate content (web pages, data files).

Communication Dynamics

  • Communication is always initiated by the client; the server responds accordingly.
  • HTTP and HTTPS are largely interchangeable; HTTPS adds security features like encryption but follows similar principles.

Connection Mechanisms in HTTP

Transport Protocol Requirements

  • Clients and servers establish a connection mechanism for communication.
  • HTTP relies on TCP for reliability; TCP ensures messages are not lost during transmission.

OSI Model Context

  • Discussion often involves the OSI model's application layer where backend engineers operate.

Evolution of HTTP Versions

Historical Changes in Protocol Efficiency

  • Different versions have improved how clients/servers send/receive data over time.

Key Improvements Across Versions:

  1. HTTP 1.0: Each request opened a new connection leading to inefficiencies due to constant connection establishment.
  1. HTTP 1.1: Introduced persistent connections allowing multiple requests/responses over one connection significantly improving performance.
  1. HTTP/2: Added multiplexing capabilities using binary framing instead of text, supporting header compression and server push features.
  1. HTTP/3: Built on QUIC protocol over UDP for enhanced performance.

This structured overview captures essential insights into the workings of the HTTP protocol while providing clear timestamps for further exploration within the video content.

Understanding HTTP Messages and Headers

Overview of HTTP Protocol

  • The protocol is designed over UDP instead of TCP, enhancing performance through faster connection establishment, reduced latency, and improved packet loss handling.
  • It's crucial to remember that a network connection is established between clients and servers for message exchange.

Structure of HTTP Messages

  • An HTTP request message is sent by the client, while a response message is received from the server.
  • Key components of a request include the request method, resource URL, HTTP version (typically 1.1), host domain, headers, and body.
  • The response message contains the HTTP version, status code (e.g., 200 means OK), response headers, and body.

Importance of HTTP Headers

  • Headers are key-value pairs that provide essential metadata about requests or responses.
  • They serve to convey additional information without cluttering the URL or request body.

Real-Life Analogy for Understanding Headers

  • Using parcel delivery as an analogy: important details like recipient address are placed on top for easy access rather than inside the package.
  • This approach allows quick reference to metadata without needing to open each package repeatedly.

Types of HTTP Headers

Request Headers

  • Request headers are sent by clients to inform servers about the nature of their requests (e.g., user agent type).

General Headers

  • These headers contain metadata applicable to both requests and responses (e.g., date sent).

Representation Headers

  • Focused on content representation; they specify media types (JSON, HTML), content length in bytes, and encoding methods.

Security Headers

  • Enhance security by controlling behaviors such as content loading and encryption protocols. Examples include HSTS for secure communication and Content Security Policy to prevent cross-site scripting attacks.

Understanding HTTP Headers and Methods

The Role of HTTP Headers in Security

  • Options prevent the browser from guessing the MIME type of content, mitigating MIME type sniffing attacks.
  • HTTP is highly extensible; headers can be added or customized without altering the protocol, allowing for adaptability to new technologies.
  • Custom headers (e.g., X-Custom-Header) can be created by developers for specific application needs, enhancing functionality.
  • Remote control capabilities allow clients to influence server responses through headers like Accept and Content-Type.
  • Authentication can be managed via Authorization headers, impacting access control decisions.

Understanding HTTP Methods

Types of Actions Represented by HTTP Methods

  • HTTP methods define different actions a client can request from a server, emphasizing intent behind each action.
  • GET requests are used to fetch data without modifying server state; POST requests create new data on the server with a body for user input.

Update Operations: PATCH vs PUT

  • PATCH updates existing data selectively while PUT replaces it entirely; developers often misuse PUT when PATCH is appropriate.

Idempotent vs Non-idempotent Methods

  • Idempotent methods (GET, PUT, DELETE) yield the same result regardless of how many times they are called. For example, deleting a resource only once results in consistent outcomes.
  • POST is considered non-idempotent as repeated submissions create multiple resources or results.

Special Use Case: OPTIONS Method

  • The OPTIONS method plays a role in CORS flow and may not be directly used by developers but appears in browser network tabs during pre-flight checks.

Understanding CORS: Cross-Origin Resource Sharing

What is CORS and Why is it Important?

  • CORS, or Cross-Origin Resource Sharing, is a mechanism that allows web applications to request resources from different domains while adhering to the same-origin policy enforced by browsers for security.
  • The same-origin policy restricts web pages from making requests to a domain different from the one serving the page, which helps prevent malicious activities.

Types of CORS Requests

Simple Request Flow

  • In a simple request flow, when a client at example.com makes a GET request to api.example.com, the browser automatically adds an origin header indicating where the request originated.
  • If the server checks this origin against its CORS policy and allows it, it responds with an Access-Control-Allow-Origin header in its response.

Handling Responses

  • If the server's response includes the appropriate CORS headers (like Access-Control-Allow-Origin), then the browser permits access to that resource.
  • Conversely, if these headers are absent or do not match, the browser blocks access and logs a CORS error in the console.

Preflight Request Flow

Conditions for Preflight Requests

  • A preflight request occurs when certain conditions are met:
  • The method used is not GET, POST, or HEAD (e.g., PUT or DELETE).
  • The request must be cross-origin.
  • It may include non-simple headers like authorization tokens or custom headers.

Structure of Preflight Requests

  • A preflight request uses the OPTIONS method and includes details such as:
  • The requested resource URL
  • HTTP version
  • Host header of the API
  • Origin header indicating where the request comes from
  • Access-Control-Request-Method header asking if specific methods are supported by that route.

This structured approach ensures that servers can manage cross-origin requests securely while allowing necessary interactions between different domains.

Understanding CORS: Pre-flight Requests and Server Responses

Overview of CORS and Pre-flight Requests

  • The discussion begins with a general inquiry about server capabilities, particularly regarding Cross-Origin Resource Sharing (CORS). It highlights how servers respond to requests based on their ability to handle cross-origin requests.
  • A 204 status code indicates "No Content," which is used when the server has no content to return. This section introduces four important headers in the response related to CORS.
  • The first header, Access-Control-Allow-Origin, specifies whether a client's domain (e.g., example.com) is allowed as a valid cross-origin request. It can also respond with a wildcard (*) allowing all clients.

Server Response Headers

  • The server's response includes permissions for specific HTTP methods like PUT and DELETE, indicating what actions are allowed on the resource.
  • Another header allows authorization headers, confirming that the server accepts them in requests. Additionally, Access-Control-Max-Age informs the browser not to make repeated pre-flight requests for 24 hours.

Flow of CORS Requests

  • After receiving appropriate headers from the pre-flight request, the browser sends the original request. The server then responds according to this original request.
  • Emphasizing learning from first principles, it is noted that understanding concepts precedes diving into code or implementation specifics.

Tools for Demonstration

  • Burp Suite is introduced as a tool used by ethical hackers for intercepting and visualizing HTTP traffic. Its features will be utilized in demonstrating CORS flow.

Practical Demonstration of Requests

  • A simple front-end application is set up to illustrate both simple requests and pre-flight requests within an actual browser environment.
  • The initial simple request is made, showcasing its structure including method, URL, headers, status code, and response body.

Cross-Origin Request Considerations

  • For a request to be considered cross-origin by the browser, it must have different ports or domains; here localhost:5173 attempts access to localhost:3000.
  • The origin header plays a crucial role in identifying where the request originates from compared to where it's being sent.

Handling Missing Headers

  • When examining responses from cross-origin requests, browsers check for Access-Control-Allow-Origin. If absent or incorrect (not matching expected origins), they block access due to security policies.
  • A demonstration shows what happens when this critical header is removed; upon refreshing without it present on the server side results in a blocked response due to CORS errors.

Understanding CORS and HTTP Response Codes

CORS Request Flow

  • The discussion begins with the importance of the Access-Control-Allow-Origin header, which is crucial for allowing cross-origin requests. Without it, browsers block requests due to security errors.
  • A pre-flight request is initiated using the OPTIONS method, indicating that a cross-origin request is being made. This step is essential for understanding how browsers handle such requests.
  • The server responds to the pre-flight request with a 204 No Content status code, signifying that it's merely an inquiry without any body content involved.
  • Several conditions necessitate a pre-flight request: methods other than GET, POST, or HEAD; presence of authorization headers; and specific content types like application/json.
  • The original request triggers a pre-flight because it includes an authorization header and uses application/json as its content type—both factors disqualifying it from being classified as a simple request.

Pre-flight Request Analysis

  • The server's response to the pre-flight includes allowed methods (GET, POST, PUT, DELETE), informing clients about what actions they can perform on the server.
  • It also specifies which headers are permitted in actual requests (Content-Type and Authorization), ensuring clients know what they can include in their requests.
  • Clients inquire whether certain methods and headers are allowed through their pre-flight requests. The server confirms these permissions in its response.
  • For testing purposes, max age for access control is set to zero to avoid caching issues during development.
  • After successful completion of the pre-flight check (204 status), the browser proceeds with executing the original PUT request containing necessary details like resource URL and authorization header.

Importance of HTTP Response Codes

  • HTTP response codes serve as standardized indicators of a request's outcome. They allow clients to quickly assess success or failure without delving into response bodies.
  • These codes streamline error handling by providing specific identifiers for various issues (e.g., 401 for unauthorized access).
  • Standardization across web services ensures consistency in communication between servers and clients regardless of programming languages used (Python, JavaScript, etc.).
  • Before standard HTTP status codes were established, clients had to infer outcomes based on response content—a process fraught with inconsistencies.
  • With standardized codes like 400 for bad requests or 200 for successful ones, interactions become more efficient and predictable across different platforms.

This structured overview captures key insights from the transcript while linking back to specific timestamps for further exploration.

Understanding HTTP Response Codes

Categorization of HTTP Response Codes

  • HTTP response codes are categorized into different levels based on their first digit:
  • 1xx: Informational responses
  • 2xx: Success responses
  • 3xx: Redirection messages
  • 4xx: Client errors
  • 5xx: Server errors

Detailed Breakdown of Success Responses (2xx)

  • The most common success response codes include:
  • 200 OK: Indicates a successful request and the server returns the requested resource.
  • 201 Created: Signifies that a new resource has been created, typically in response to a POST request.
  • 204 No Content: Indicates a successful request with no content returned, often used for DELETE requests.

Understanding Redirection Responses (3xx)

  • Commonly used redirection codes include:
  • 301 Moved Permanently: The requested resource has been permanently moved to a new URL; future requests should use this new URL.
  • 302 Found (Temporary Redirect): The resource is temporarily located at a different URL; clients should continue using the original URL for future requests.
  • 304 Not Modified: Indicates that the resource has not changed since the last request, allowing efficient caching.

Overview of Client Error Responses (4xx)

  • Key client error codes include:
  • 400 Bad Request: Triggered when invalid data is sent by the client, indicating an issue with the request format.
  • 401 Unauthorized: Used when authentication is required but not provided or invalid credentials are given.
  • 403 Forbidden: Indicates that access to the requested resource is denied even if authenticated; permissions are insufficient.
  • Additional notable client errors:
  • 404 Not Found: Fired when a requested resource cannot be found due to an incorrect URL or deletion.
  • 405 Method Not Allowed: Occurs when an invalid HTTP method is used for a specific endpoint.

Understanding HTTP Status Codes

Common Client Error Responses

  • Typos in requests can lead to errors like 405 (Method Not Allowed) when incorrect HTTP methods are used, such as using a PATCH instead of a PUT.
  • A 409 (Conflict) error occurs when users attempt to create resources with non-unique identifiers, like folder names that already exist.
  • The 429 (Too Many Requests) status is used for rate limiting, indicating that a client has exceeded the allowed number of requests within a specified time frame.

Server Error Responses

  • The 500 (Internal Server Error) indicates unexpected conditions on the server side, often due to unhandled exceptions or process failures.
  • A 501 (Not Implemented) response signifies that the server does not currently support the requested method but may do so in the future.
  • The 502 (Bad Gateway) error arises when an upstream server returns an invalid response while acting as a proxy or load balancer.

Service Availability Issues

  • A 503 (Service Unavailable) status is returned when the service cannot handle requests temporarily due to high traffic or maintenance activities.
  • The 504 (Gateway Timeout) indicates that an upstream server failed to respond within the designated timeout period, often seen in proxy configurations.

Practical Demonstration of Status Codes

  • A demo illustrates various HTTP responses from a front-end application interacting with a mock server designed to return different status codes based on request types.
  • Initial requests yield successful responses like 200 OK and 201 Created, confirming resource creation and successful interactions with the server.

Summary of Key Status Codes Encountered

  • Errors such as 401 Unauthorized indicate missing or invalid authentication tokens; while 403 Forbidden means access is denied for certain actions.
  • A common error encountered is the 404 Not Found status, which signals that requested resources could not be located on the server.
  • The internal server error (500), along with service unavailable messages (503), highlight issues without revealing sensitive information for security reasons.

Understanding Caching in HTTP

What is Caching?

  • Caching is a technique that stores copies of responses for reuse, reducing the need for repeated requests to the server. This improves load time, reduces bandwidth usage, and decreases server load.
  • The client can reuse old data if it hasn't changed, which is the fundamental principle of caching.

Demonstrating Caching in a Browser

  • A fetch operation begins when a page is rendered. The last request made during this process typically retrieves essential resources like JavaScript files and CSS.
  • Key response headers include:
  • Cache-Control: Indicates how long (e.g., 10 seconds) the resource should be cached.
  • Etag: A hash representing the response; used to determine if the cached version matches the current version on the server.
  • Last Modified: Shows when the resource was last updated, helping decide whether to use cached data or request new data.

Fetching Cached Resources

  • When fetching a resource again, specific headers are sent:
  • If-None-Match: Compares Etag values to check for changes.
  • If-Modified-Since: Checks if the resource has been modified since it was last fetched.
  • If neither condition is met (i.e., no changes), the server responds with a 304 Not Modified, allowing clients to use their cached version.

Updating Resources

  • When an update occurs (e.g., via POST), a new Etag is provided by the server. The client must then send this new Etag in subsequent requests.
  • If an outdated Etag is sent after an update, the server will respond with 200 OK instead of 304, indicating that there’s new data available.

Challenges and Modern Solutions

  • Managing caching through HTTP can become complex as servers must handle Etags correctly; failure to do so may lead clients using outdated resources.
  • Modern solutions like React Query offer enhanced client-side caching capabilities, giving developers more control over when to use cached resources versus refetching them.

Content Negotiation and HTTP Compression

Understanding Content Negotiation

  • Content negotiation is a mechanism that allows clients and servers to agree on the best format for data exchange, such as JSON, XML, or HTML.
  • Clients can specify their preferred formats using headers like Accept, which informs the server of the desired response format.
  • There are three main types of content negotiation:
  • Media Type: Specified through the Accept header (e.g., application/json).
  • Language Negotiation: Specified via the Accept-Language header (e.g., English or Spanish).
  • Encoding Negotiation: Specified with the Accept-Encoding header (e.g., gzip or deflate).

Demonstrating Content Negotiation

  • A demo illustrates how different requests affect responses based on specified headers. The client communicates preferences to the server.
  • An initial request is made with English as the language, JSON as the format, and gzip as encoding. The server responds accordingly.
  • Changing the language preference to Spanish results in a response in Spanish while maintaining JSON format.
  • When switching from JSON to XML while keeping Spanish as the language, the server adapts its response format accordingly.

Benefits of Content Negotiation

  • Content negotiation simplifies client-server interactions by allowing clients to specify their preferences for data formats and languages.
  • This flexibility enhances user experience by ensuring that responses align with client expectations.

Introduction to HTTP Compression

  • HTTP compression reduces file sizes during transmission. Common methods include gzip and deflate.
  • A demonstration shows a large file's size reduced from 26 MB to 3.8 MB when compression is enabled, highlighting bandwidth efficiency.

Importance of Compression

  • Disabling compression significantly increases file size during transfer, emphasizing its necessity for efficient data handling across networks.

Persistent Connections in HTTP/1.1

  • In early HTTP versions (HTTP/1.0), each request-response cycle required separate connections, leading to inefficiencies.
  • HTTP/1.1 introduced persistent connections that allow multiple requests/responses over a single TCP connection using a "keep-alive" mechanism.

Key Points on Persistent Connections

  • In HTTP/1.1, connections remain open by default for further requests unless explicitly closed by either party, improving resource utilization and speed.

Understanding HTTP Connection Management

Connection Persistence and Keep-Alive Headers

  • Multiple HTTP requests and responses can be sent over a single connection, reducing latency and conserving resources by minimizing the number of connections established.
  • The Connection: keep-alive header is used to request that the server maintain an open connection, which can specify timeout durations or maximum requests before closure.
  • When a connection is set to close, it will terminate after sending the response; this behavior is default in HTTP 1.0 but can also be enforced in HTTP 1.1.

Handling Large Requests and Responses

  • Large files (e.g., images, videos) are handled differently than typical JSON data; multipart requests are utilized for sending large files from clients to servers.
  • Multipart requests transfer file data in parts, requiring a boundary parameter to delineate these segments within the request body.

Demonstration of File Uploading

  • An example shows how a client uploads a file using a POST request with specified content length and type as multipart/form-data, including boundaries for separating file parts.
  • The server responds with details confirming successful upload once it reads the binary data sent in parts.

Streaming Large Responses

  • For receiving large responses, servers can stream data to clients in chunks using GET requests; this allows continuous reception until all data is transferred.
  • The response includes headers indicating content type as text/event-stream and keeps the connection alive during transmission.

Understanding SSL/TLS/HTTPS

  • SSL was originally used for securing communications between clients (like web browsers) and servers but has been replaced by TLS due to security vulnerabilities.
  • TLS encrypts data during transit, ensuring protection against interception and tampering through certificates that authenticate servers.
  • HTTPS utilizes TLS for secure communication between browsers and servers, safeguarding sensitive information like login credentials from potential attackers.

Understanding TLS and HTTP for Application-Level Development

Key Takeaways on TLS and HTTP

  • The discussion emphasizes the essential knowledge required about TLS (Transport Layer Security) and SDP (Session Description Protocol) to effectively work at the application level, particularly in backend systems.
  • A comprehensive understanding of HTTP is crucial; while there are additional resources available for deeper learning, grasping the fundamental components discussed is sufficient for backend development.
  • Internalizing the flow of various components related to HTTP will enable developers to navigate backend systems more effectively.
  • The speaker encourages revisiting sections of the material to ensure a thorough understanding, highlighting that mastery of these concepts is vital for success in application-level work.
  • Overall, a solid foundation in these topics equips developers with the necessary skills to engage with complex backend systems confidently.
Video description

In this video we do a deepdive of all the components and responsibilities of HTTP in a typical backend application. Join the Discord community: https://discord.gg/NXuybNcvVH 00:00 - Intro 00:15 - HTTP intro 05:46 - Evolution of HTTP 07:29 - HTTP messages 09:09 - Why do we need HTTP headers 11:22 - Types of HTTP headers 16:23 - HTTP methods 18:03 - Idempotent vs non-idempotent 20:07 - OPTIONS method and CORS workflow 29:44 - CORS demo with burp suite 38:51 - Response status codes 52:20 - Response status codes demo 55:20 - HTTP caching 01:02:29 - HTTP content negotiation 01:06:53 - HTTP compression 01:08:51 - Persistend connections and keep-alive 01:11:04 - Multipart data and chunked transfer 01:15:18 - SSL, TLS and HTTPS #backend #nodejs #golang #softwareengineering Nerd out about the history of technologies here https://www.fascinatingtechhistory.xyz/