22. Concurrency & Parallelism: IO Bound vs CPU Bound

Name: 22. Concurrency & Parallelism: IO Bound vs CPU Bound
Uploaded: 2026-01-01T07:23:42.000Z
Duration: 2 h 56 min 6 s

Understanding Backend Systems and Concurrency

The Need for Handling Multiple Requests

Every backend system must handle multiple requests simultaneously to avoid bottlenecks, especially in web servers that serve numerous users.

A server processing only one request at a time would lead to delays or errors for other users, making it impractical for production environments with thousands of concurrent users.

Importance of Mental Models in Backend Development

Understanding how servers and operating systems manage concurrency is crucial for forming effective mental models while building applications.

This video aims to provide insights into backend processes rather than actionable techniques, helping developers structure their applications better and make informed architectural decisions.

Concurrency Keywords and Their Implications

Many developers learn about keywords like async and await, which facilitate concurrency, but often lack understanding of their underlying mechanics.

Grasping the mechanics behind these keywords can enhance debugging skills and overall comprehension of system performance.

Request Processing Scenario

The discussion begins with a typical scenario where a user makes a request from a browser to the backend server, involving various layers such as routing and database interaction.

During database queries, the server must wait for responses, which can vary significantly based on network conditions (1–100 milliseconds).

Idle CPU Time During Waiting Periods

The video poses an important question: what does the server do while waiting for database responses?

If the server processes requests synchronously (one at a time), it remains idle during wait times, leading to wasted CPU resources.

Quantifying Wasted Resources

Modern CPUs can execute approximately 3 billion instructions per second; thus, during idle periods (e.g., 100 milliseconds), they could have executed hundreds of millions of instructions instead.

Understanding Concurrency and Parallelism in Programming

The Importance of Concurrency

Discusses the waste caused by not implementing concurrency in programs, particularly in API calls that typically involve multiple database queries and external service interactions.

Highlights that a typical API call may involve five different network operations, including database queries and email services.

Input/Output Operations

Explains that if each network response takes an average of 50 milliseconds, the total wait time for responses can reach around 250 milliseconds.

Clarifies that while waiting for these responses (I/O operations), the CPU only processes for about 10 milliseconds, leading to idle resources 95% of the time.

The Role of Concurrency

Emphasizes that more than half of a backend application's time is spent waiting on I/O operations like database responses or file system interactions.

States that concurrency allows better utilization of CPU and memory by performing other tasks while waiting for I/O operations to complete.

Defining Parallelism vs. Concurrency

Introduces parallelism as executing multiple instructions simultaneously, requiring hardware support such as multiple CPU cores.

Contrasts this with concurrency, which can be achieved even with a single core by structuring programs to start, pause, and resume tasks effectively.

Visualizing Concurrency and Parallelism

Defines parallelism as doing multiple things at once versus concurrency as managing multiple tasks at once without necessarily executing them simultaneously.

Uses a timeline analogy to illustrate how requests can be handled concurrently even when they arrive at the same time.

Understanding Concurrency and Parallelism in CPU Processing

The Role of CPU in Request Processing

The initial processing of requests involves validation, routing logic, and JSON deserialization, all requiring CPU work for the first 5 milliseconds.

After 5 milliseconds, while request A is processed, request B waits for a database query response which takes around 40 milliseconds.

Once the CPU is no longer needed (waiting for I/O), the operating system scheduler allocates CPU access to request B after 5 milliseconds.

Handling Multiple Requests

Request B utilizes the CPU for about 15 milliseconds before needing another database interaction, leading to further waiting.

At 40 milliseconds, the database responds to request A; however, it does not immediately get CPU access due to scheduling logic.

Request A resumes processing at exactly 50 milliseconds after being paused for I/O wait time.

Understanding Concurrency

While request A uses the CPU, request B remains idle until A requires I/O. This illustrates how concurrency allows multiple requests to progress without wasting CPU cycles.

Despite having only one core, both requests appear active from a high-level perspective; however, only one is processed at any given moment.

Distinguishing Between Concurrency and Parallelism

In parallelism with two cores available, both requests can be processed simultaneously as they each receive a slice of CPU time concurrently.

This simultaneous processing differentiates parallelism from concurrency where only one task runs at a time despite multiple tasks being in progress.

Importance of Understanding I/O vs. CPU Bound Processes

Recognizing whether processes are I/O bound or CPU bound is crucial since most API call activities involve waiting on external resources rather than performing computations.

Actual computation occurs when executing instructions on the CPU core; other interactions like database queries or logging are typically I/O bound.

Understanding IO Bound vs CPU Bound Tasks

Key Differences Between IO Bound and CPU Bound

Definition of IO Bound: Refers to tasks that require input/output operations, such as sending requests and waiting for responses. These tasks are limited by the speed of data transfer rather than processing power.

Definition of CPU Bound: Involves tasks that require significant computation, like validating JSON data. These tasks utilize CPU cycles heavily and are constrained by processing power rather than data transfer speeds.

Characteristics of Backend Applications

Predominance of IO Bound Tasks: Most backend applications are primarily IO bound, necessitating concurrency to handle multiple requests efficiently.

Examples of CPU Bound Workloads: While some operations like validations are CPU bound, they typically complete quickly (1-2 milliseconds). Heavy computations like image processing or encryption significantly tax the CPU.

Concurrency vs Parallelism

Concurrency in IO Bound Tasks: Essential for managing multiple database connections, API calls, and other external interactions without wasting resources.

Parallelism in CPU Bound Tasks: More effective for executing heavy computational tasks quickly by utilizing multiple CPU cores simultaneously.

Mechanisms for Achieving Concurrency

Threads and Event Loops

Two Main Mechanisms: Computers achieve concurrency through threads or event loops. Programming languages implement these concepts differently but fundamentally rely on them.

Understanding Threads

Thread Definition: A thread is an independent execution unit managed by the operating system that allows concurrent execution within a program.

Thread Components: Each thread has its own stack for function calls and local variables, along with an instruction pointer to track execution progress.

Thread Scheduling

Role of the Scheduler: The operating system's scheduler determines which threads run at any given time based on their assigned time slices for processing.

Importance of Scheduling Control: The OS controls scheduling to manage resource allocation effectively among competing threads.

This structured approach provides a clear understanding of key concepts related to IO bound versus CPU bound tasks while also detailing how concurrency is achieved through threads and event loops.

Understanding Thread Scheduling and Blocking Operations

Basics of Thread Scheduling

Threads are assigned a specific amount of CPU processing time, typically measured in milliseconds. For example, if a thread is allocated 2 milliseconds, the scheduler will pause it after that duration.

The basic scheduling algorithm involves switching between threads after their allocated time expires. This process is complex and has been extensively documented in literature on operating systems.

A recommended resource for further understanding thread management and operating system scheduling is "Operating System: Three Easy Pieces," which provides foundational insights into how operating systems function.

Preemptive Scheduling Explained

The type of scheduling where threads are paused before completion is known as preemptive scheduling. This means that the execution of threads can be interrupted by the scheduler at any time.

Understanding Blocking Operations

A blocking operation occurs when a thread performs an action that prevents the CPU from executing other tasks, such as input/output operations or interactions with devices (e.g., logging or network calls).

When a thread encounters a blocking operation, it informs the operating system to mark it as blocked, allowing the scheduler to switch to another runnable thread.

Thread States and Process Isolation

Once an I/O operation completes for a blocked thread (e.g., after receiving data from a database), it does not immediately regain CPU access but enters a state where it can be scheduled again when resources allow.

Each program runs as a separate process within the operating system; thus, threads from different processes cannot share memory due to security and stability concerns.

Memory Sharing Among Threads

Threads within the same process can share memory resources like heap allocations or global variables. This allows them to communicate effectively through shared data structures.

Accessing shared memory via pointers enables fast communication between threads without needing serialization or copying data.

Parallelism vs Concurrency

With multiple CPU cores available, different threads can execute simultaneously across these cores—this represents parallelism rather than mere concurrency.

Understanding Thread Overhead in Parallel Computing

The Cost of Using Threads

When utilizing multiple cores, such as four, it allows for four threads to run in parallel, significantly speeding up CPU-bound tasks. However, this introduces overhead costs that need consideration.

One major overhead is memory usage; each thread requires a new stack (approximately 8 MB on Linux), which can lead to substantial memory consumption during high traffic scenarios.

For instance, if a server receives 10,000 requests and creates a thread for each one (hypothetically 500 KB per thread), it could consume over 8 GB of memory just for handling these threads.

This heavy memory requirement can quickly exhaust server resources, leading to potential crashes or performance degradation due to insufficient available memory.

Creation Overhead and Context Switching

Another significant overhead is the creation time of threads; every new thread necessitates a system call to the operating system kernel, which involves setting up stacks and managing data structures.

This process can take microseconds to milliseconds but accumulates latency when many function calls are involved in program execution.

The most critical source of latency arises from context switching between threads. Each switch requires saving CPU registers and updating bookkeeping information about the current state of threads.

With numerous threads (e.g., 100), frequent context switches become unproductive maintenance work that consumes valuable CPU time without contributing directly to processing tasks.

Alternative: Event Loop Model

An alternative approach is the event loop model, which typically operates with a single thread instead of multiple ones. It utilizes callbacks and asynchronous programming techniques to manage concurrent operations without blocking the main execution flow.

In this model, tasks like database queries do not block the event loop; instead, they signal completion through callbacks while allowing other operations to continue executing concurrently.

This structured overview highlights key concepts regarding threading overhead in computing environments while contrasting traditional multi-threading with an event-driven approach.

Understanding Event Loop and Concurrency

Overview of Event Loop Functionality

The event loop manages database queries by pausing functionality until a response is received, allowing the CPU to handle other tasks in the meantime.

When an I/O operation occurs, control is handed back to the event loop, which tracks tasks waiting for CPU processing while they are blocked by I/O operations.

Operating System Support for Event Loops

Programming languages rely on operating system support for implementing event loops; Linux uses eol, while macOS utilizes KQ.

The event loop can monitor numerous connections simultaneously, checking each iteration for completed I/O operations and resuming tasks based on priority.

Efficiency of Event Loops vs. Threading Models

For I/O-bound problems, the event loop is more efficient than threading models due to reduced context switching and memory overhead.

In contrast, CPU-bound problems benefit from multiple threads as execution speed increases with thread count; however, excessive threads can slow down performance due to context switching.

Trade-offs in Event Loop Architecture

A significant trade-off of using an event loop is that it cannot be blocked; long-running CPU-bound tasks will halt all other processes within the loop.

To maintain efficiency in an event-driven architecture, callbacks executed after I/O or CPU processing must be quick to avoid blocking the event loop.

Handling Tasks in Backend Applications

Most backend applications do not typically involve lengthy CPU-bound tasks; latency often arises from I/O-bound operations instead.

Languages like JavaScript utilize asynchronous primitives (callbacks, promises, async/await) as syntactic sugar to prevent blocking operations that could hinder the event loop's performance.

Practical Application of Concurrency Models

Asynchronous programming allows developers to yield control during I/O operations (e.g., database queries or API calls), ensuring smooth execution without blocking.

In a threading model setup, multiple threads handle concurrent requests (e.g., parsing JSON bodies), demonstrating how different concurrency models operate under various workloads.

Database Query Handling and IO Operations

Overview of Database Interaction

The process begins with a straightforward database query: SELECT * FROM users, which retrieves all users from the database to send back to the client.

Establishing a TCP connection is crucial; if using a connection pool, a connection is obtained from there to execute the query.

Communication with Database

After sending the query, communication occurs via a socket, where a read operation waits for the database's response. This is an IO-bound operation that blocks CPU activity.

During this blocking state, the CPU cannot perform other tasks as it relies on external responses, highlighting the nature of IO-bound operations.

Thread Management in Concurrency

In threading-based concurrency models, when one thread blocks due to IO operations, it signals the operating system scheduler to switch to another available thread.

While handling request A, if another request (request B) comes in, processing can continue by routing and validating this new request while waiting for IO completion on request A.

Handling Multiple Requests

For each new request (e.g., fetching orders), similar steps are followed: establishing connections and executing queries while managing potential blocking states.

If no threads are available during blocking periods, all processes wait until an IO operation completes before resuming execution.

Processing Responses and Serialization

Once a response is received from the database, processing resumes. The data returned is parsed into native structures based on programming language conventions (e.g., objects in JavaScript or dictionaries in Python).

After parsing data into appropriate structures, serialization into JSON format occurs for network transmission back to clients.

Example Implementation in Python

An example function illustrates how requests are handled: querying for a single user involves executing SELECT * FROM users WHERE ID = ?, marking it as an IO operation that may block execution.

In this threading model scenario without async/await patterns, blocking leads to switching threads or waiting for responses from the database before continuing execution.

Understanding Threading vs. Event Loop in Request Handling

The Nature of Blocking Operations

When a request is processed, the CPU handles the response from the database and executes it sequentially, which may appear as a simple line-by-line execution to humans.

However, blocking operations are not visible; thread switching occurs behind the scenes during this process.

Transition to Event Loop Architecture

In an event loop model, while the sequence of operations remains similar to threading, the efficiency of pausing and resuming tasks differs significantly.

The process begins with parsing requests and establishing TCP connections for database queries, similar to threading setups.

Callbacks in Event Loop Mechanism

Control is given back to the operating system when waiting for I/O responses by registering callbacks that define what logic should execute once data is received.

This allows other tasks to be executed while waiting for I/O operations without blocking the entire thread.

Iterative Checking in Event Loops

As requests are processed (e.g., Request A), if another request (Request B) arrives during I/O wait times, it can be picked up by the event loop.

The event loop continuously checks whether I/O operations for any registered requests are complete through iterations.

Efficiency of Monitoring I/O Responses

Each iteration of the event loop checks if responses from previous requests have been completed using OS-level functions like epoll on Linux or KQueue on macOS.

Once a response is ready (e.g., socket B becomes readable), registered callbacks execute their associated logic to handle incoming data.

Conclusion: Differences Between Models

Both models ultimately return results back to clients after processing; however, they differ fundamentally in how they manage task pauses—event loops use callbacks while threading relies on OS-level structures.

This distinction highlights how event-driven architectures can efficiently handle multiple concurrent requests without traditional blocking mechanisms.

Concurrency in JavaScript and Go

Understanding Concurrency in JavaScript

The concept of concurrency in JavaScript is illustrated through the use of the event loop, which allows handling multiple requests simultaneously. This was the approach before ES6.

A database driver function is called to handle JSON requests, where a query selects users based on an ID received from the request.

The callback function receives the response from the database; errors are passed as the first parameter while results are passed as the second.

Callbacks are essential for executing logic after receiving responses, allowing data to be sent back to clients using a send response function.

Modern JavaScript (post ES6) introduces async and await, simplifying asynchronous code by eliminating nested callbacks.

Transitioning to Async/Await Syntax

The async/await syntax serves as syntactic sugar over callbacks, making it easier to manage multiple asynchronous operations without deep nesting.

In callback-based setups, additional queries require more nested callbacks, leading to complex and unreadable code structures.

Async/await improves readability by allowing developers to write asynchronous code that looks synchronous, reducing complexity significantly.

When using await, control is handed over to the event loop until a response is received, enabling other tasks to run concurrently during this wait time.

Event loops operate efficiently with single-threaded models for I/O-bound operations but may not be suitable for CPU-bound tasks where threading performs better.

Exploring Go's Concurrency Model

Go employs a different concurrency model that utilizes virtual threads known as goroutines instead of traditional operating system-managed threads.

Goroutines are lightweight compared to OS threads; they reduce memory overhead associated with thread creation and management.

Each request in Go creates a new goroutine, allowing concurrent processing without significant resource consumption or risk of running out of memory due to heavy thread usage.

The main goroutine can spawn additional goroutines while executing, enhancing parallelism within applications built using Go.

This structured overview captures key insights into concurrency models in both JavaScript and Go programming languages while providing timestamps for easy reference.

Understanding Go's Concurrency Model and Goroutines

The Cost of Threads in Programming

Threads are resource-intensive, leading to potential memory exhaustion when creating a large number (e.g., 10,000 or 20,000) of them.

Goroutines and HTTP Requests

Go's serve function handles incoming connections by spawning a new goroutine for each HTTP request, assigning it a pre-configured handler.

Creating Goroutines in Go

The go keyword is used to initiate a new goroutine, allowing functions to run concurrently as virtual threads. This is particularly relevant in backend programming contexts.

Handling I/O Operations with Goroutines

When an I/O blocking operation occurs (like database queries), the Go runtime scheduler pauses the current goroutine and switches context to another one, optimizing resource use.

The Role of the Go Runtime Scheduler

Unlike OS-level thread scheduling, the Go runtime scheduler manages goroutines efficiently within its own framework, allowing for lightweight context switching without heavy overhead associated with traditional threads.

Comparison with Operating System Threads

Traditional threading models involve significant overhead due to creation and context-switching costs; however, goroutines minimize these issues by operating independently from OS threads while still being able to utilize multiple CPU cores effectively.

Efficient Management of Goroutines

Each operating system thread can manage multiple goroutines through queues managed by the Go runtime scheduler, enabling high concurrency without overwhelming system resources. This allows thousands or even millions of goroutines to be active simultaneously depending on available resources.

Understanding Go Routines and Asynchronous Programming

The Efficiency of Go Routines

Go routines allow for the creation of significantly more concurrent operations compared to operating system threads, enabling up to 100 times more due to their lightweight nature.

Each HTTP request in Go spawns a new go routine, which is feasible because they are lightweight; blocking one go routine allows others to continue executing on the same OS thread.

When a go routine (G1) blocks during an I/O operation, the Go runtime pauses it and switches execution to another go routine (G2), optimizing resource usage.

Handling Database Queries with Go Routines

A typical handler function in Go executes a database query using DB.query, pausing its current go routine while waiting for the response.

Once the I/O operation completes, the paused go routine resumes execution and sends back a response to the client.

Distinction Between Threading Models

The virtual thread model in Go operates at a higher abstraction level than traditional threading models, utilizing a scheduler instead of relying solely on OS-level threads.

Understanding Async/Await Mechanism

In various programming languages like Python and JavaScript, async/await syntax indicates that an operation is blocking; control is yielded back to the event loop until completion.

After receiving a response from an awaited function call, subsequent code execution resumes within a callback structure.

Conceptualizing Async Functions as State Machines

An async function can be visualized as a state machine interacting with an event loop; it maintains states that dictate its flow based on asynchronous events.

The initial state begins at zero; upon entering an I/O operation, it transitions states based on responses received from database queries or other asynchronous tasks.

Execution Flow in State Machine Model

Inside this state machine setup, functions can return other functions allowing for complex flows based on current states defined by switch statements.

During state transitions caused by awaiting operations, control returns to the event loop while waiting for results from asynchronous calls.

This structured overview captures key insights into how Go routines operate alongside asynchronous programming concepts.

Understanding Async Functions and Concurrency Issues in Programming

The State Machine Concept of Async Functions

The execution of async functions can be visualized as a state machine, transitioning between states each time the await keyword is used.

The requirement to use await only within async functions stems from the need for these functions to be transformed into state machines upon declaration.

Blocking the event loop is detrimental because it prevents the state machine from progressing through its states, hindering asynchronous operations.

While deep understanding of how runtimes convert async functions into state machines isn't mandatory, it aids in grasping concurrency issues.

Concurrency introduces problems primarily due to shared state or memory, which can lead to various complications.

Race Conditions and Shared Variables

A counter example illustrates race conditions where two threads attempt to update a shared variable simultaneously, leading to unexpected results.

Incrementing a variable involves three steps: reading the current value, modifying it in a register, and writing it back. This sequence is crucial for understanding race conditions.

In a scenario with two threads incrementing a counter at overlapping times, one thread's update may overwrite another's due to timing discrepancies.

The final value of the counter may not reflect both increments (expected value 2), resulting instead in an incorrect value (1), illustrating the lost update problem caused by race conditions.

Implications of Async/Await in Single-threaded Environments

Despite JavaScript being single-threaded, using async/await does not eliminate concerns about race conditions; they can still occur under certain circumstances.

An example involving a withdrawal function demonstrates potential issues when multiple calls are made concurrently on shared variables like balance.

If two withdraw calls are executed simultaneously without proper handling of shared state, inconsistencies may arise similar to those seen in multi-threaded environments.

Understanding Race Conditions in Asynchronous Programming

The Withdraw Function and Balance Check

The withdraw function is called with an amount of 100, initiating a concurrent execution process where the balance is checked against this amount.

The first check confirms that the balance (100) is sufficient for the withdrawal, allowing entry into an asynchronous function while yielding control back to the event loop.

A second concurrent execution begins, repeating the balance check. It also confirms that 100 is less than or equal to 100, leading to another async function call.

Control Yielding and Balance Update

Each time await is called, control of CPU resources is relinquished, enabling other tasks to execute while waiting for operations like database calls to complete.

After both async functions have executed, the balance has been updated to zero due to two withdrawals being processed concurrently without proper checks.

Consequences of Race Conditions

By the time the second withdrawal attempts its operation, it finds that the balance has already been reduced to zero. This leads to an invalid operation where instead of deducting from a positive balance, it tries subtracting from zero.

The resulting value becomes -100, which represents an inappropriate state for account balances and highlights how race conditions can occur even in single-threaded async setups.

Preventing Race Conditions

Lock Mechanisms

To prevent race conditions, various mechanisms exist; one common method involves using locks or mutexes in programming languages like Python.

Before executing critical sections of code (like updating balances), a lock must be acquired so that only one thread can access this section at any given time.

Alternative Solutions

Other techniques include message-passing systems such as channels in Go programming language. These allow goroutines to communicate without directly sharing variables.

Key Concepts: IO Bound vs. CPU Bound Workloads

Understanding Workload Types

The primary focus of this discussion revolves around distinguishing between IO bound workloads (e.g., web servers and API gateways requiring high concurrency) versus CPU bound workloads (e.g., number crunching and data processing).

Efficiency Considerations

For IO bound tasks, asynchronous models are more efficient compared to traditional threading models due to their ability to handle multiple connections simultaneously without blocking.

Conclusion on Concurrency Models

Conversely, CPU bound tasks benefit from parallelism through threads since they require intensive computation rather than waiting on IO operations.

Understanding IO and Parallelism in Backend Development

Key Concepts of IO and Parallelism

The discussion highlights the importance of Input/Output (IO) operations, which include making external API calls and interacting with databases.

It emphasizes that these IO operations are crucial for backend development, as they often dictate how efficiently a system can perform tasks.

The concept of parallelism is introduced, explaining that it allows multiple CPU cores to execute different tasks simultaneously.

However, the speaker notes that most backend work tends to focus more on concurrency rather than pure parallelism.

Concurrency involves managing multiple tasks at once but does not necessarily mean they are executed simultaneously; it's about structuring the program to handle many tasks effectively.