Concorrência e Paralelismo (Parte 1) | Entendendo Back-End para Iniciantes (Parte 3)

Concorrência e Paralelismo (Parte 1) | Entendendo Back-End para Iniciantes (Parte 3)

Introduction and Recap of Previous Topics

In this section, Fabio Akita introduces the topic of Back-end Part 3, providing a recap of basic computer concepts, operating systems, processes, threads, compilation, static and dynamic libraries, virtual machines, interpreters, language characteristics (Java and .NET), mobile development history, and software licenses.

Exploring Back-end Concepts

  • Fabio delves into the complex topics of Concurrency and Parallelism in programming.
  • Emphasizes the challenge for beginners to grasp these concepts without practical experience.
  • Discusses the evolution of computing power from single-core processors to multi-core CPUs with hyperthreading technology.
  • Highlights the parallel processing capabilities of modern devices including smartphones and GPUs.
  • Traces the historical progression towards massive parallelism in computing over the past decade.

Evolution of Computing Power

This section explores the evolution of computing power from early mainframes to modern multi-core processors.

Computing Milestones

  • Details the IBM 701 era in the 1950s with limited computational power compared to contemporary devices.
  • Contrasts early machine specifications like memory capacity and data transfer rates with current standards.
  • Describes programming challenges in batch processing during the 1950s transitioning into more efficient job handling methods by the 1960s.

Advancements in Programming Efficiency

The discussion shifts towards advancements in programming efficiency through time-sharing concepts.

Programming Efficiency Enhancements

  • Introduces key figures like John Backus and John McCarthy who pioneered concepts for improving programmer efficiency.
  • Explores context switching as a method to allow multiple programmers to work concurrently on a single computer system.

Understanding the Evolution of Multi-Tasking in Computing

In this section, the speaker delves into the historical progression of multi-tasking capabilities in computing, starting from single-core CPUs to modern multi-core processors.

Single-Core CPU Limitations

  • Single-core CPUs could only execute one instruction per clock cycle, akin to early 8-bit and 16-bit machines.
  • Early multi-tasking involved rudimentary techniques like TSR (Terminate and Stay Resident) for running multiple programs sequentially.

Cooperative Multi-Tasking

  • Windows 3.1 introduced cooperative multi-tasking where tasks had to yield control voluntarily.
  • Issues with cooperative multi-tasking included system blockages if a program monopolized resources.

Preemptive Multi-Tasking

  • Modern systems employ preemptive multi-tasking, where an OS scheduler manages task switching at predefined intervals.

Multi-Threading and Parallel Processing

  • Introduction of threads allowed for parallel execution within a process, simulating simultaneous tasks on a shared memory space.
  • Multi-threading enables multiple threads to access shared resources concurrently but requires careful synchronization to avoid conflicts.

Evolution to Multi-Core Processors

  • With the advent of multi-core processors like Intel i3, true parallel processing became feasible with each core handling separate tasks simultaneously.
  • Servers initially led in utilizing multi-processing with two or more cores for enhanced performance.

Challenges of Concurrent Execution

  • Concurrent processes or threads can run independently but face contention when accessing shared resources, leading to potential race conditions.

Understanding Deadlocks in Programming

The speaker explains the concept of deadlocks in programming using a metaphor involving two people sharing a piece of paper and how it relates to multi-threading and resource sharing.

The "Occupied" Signboard Analogy

  • In the analogy, when one person wants to write on a shared piece of paper, they put up an "occupied" sign.
  • If both forget to remove the signs after finishing their task, a deadlock occurs where neither can proceed.
  • The "occupied" sign symbolizes a lock or mutex in programming, ensuring exclusive access to resources.

Impact of Threads on CPU Performance

The discussion delves into how threads operate within CPUs, emphasizing context switching and resource sharing implications for performance.

Thread Execution and Context Switching

  • Threads are the smallest unit of execution on CPUs with each having its context stored in CPU registers.
  • Context switching between threads incurs overhead due to moving data between registers, impacting overall performance negatively.

Optimizing Thread Usage for Efficiency

The speaker elaborates on optimizing thread usage concerning CPU cores and resource sharing for efficient program execution.

Balancing Threads with CPU Cores

  • Ideal thread count matches the number of CPU cores to prevent resource contention and maximize efficiency.
  • Creating processes incurs system costs; Linux is more efficient than Windows due to faster process creation.

Threads vs. Processes: Trade-offs in Programming

Discusses the trade-offs between using threads and processes in programming, highlighting advantages and disadvantages based on system requirements.

Managing Threads vs. Processes

  • Threads share memory but require manual mutex handling; processes offer isolation but are costlier to create.
  • Programming involves trade-offs; threads may be faster but prone to bugs, while processes offer stability at higher costs.

Choosing Between Multi-threading and Process-based Solutions

Explores practical applications of multi-threading versus process-based solutions in different operating systems like Linux, UNIX (MacOS), and Windows.

Application-specific Considerations

Desafios de Estabilidade em Navegadores

The discussion delves into the challenges related to browser stability, particularly focusing on how a bug in one thread can destabilize the entire browser due to threads having access to shared resources within the browser process.

Bug Impact on Browser Stability

  • A bug in a single thread can lead to the collapse of the entire browser, risking data loss across multiple tabs.
  • Fixing bugs may inadvertently introduce new ones, akin to trying to cover up issues without addressing root causes.
  • Chrome's approach of separating each tab into its own process enhances stability but increases memory consumption significantly.

Process Forking and Memory Management

This segment explores the concept of process forking and memory management in browsers, highlighting how Linux's copy-on-write feature optimizes memory usage during forking processes.

Process Forking and Memory Optimization

  • Process forking involves creating isolated copies of processes, with Linux's copy-on-write feature minimizing additional memory usage.
  • Copy-on-write allows processes to share memory until modifications occur, optimizing resource utilization.

Evolution of Threads in Linux Systems

The evolution of threads in Linux systems is discussed, emphasizing the transition from flawed implementations to more efficient models like NPTL.

Thread Evolution and Simplified Programming

  • Unlike threads sharing all resources within a process, process forking provides independent memory spaces per process without requiring access control mechanisms like mutexes.

Scheduler Strategies and Multitasking Challenges

The conversation shifts towards scheduler strategies and multitasking challenges faced by supervisors managing multiple tasks or "tables."

Scheduler Dynamics and Multitasking Complexities

Linux Scheduler Evolution

The discussion delves into the evolution of Linux schedulers, highlighting the challenges faced in earlier versions and the improvements brought about by the Completely Fair Scheduler (CFS) in 2007.

Linux Scheduler Challenges

  • In early 2000, Linux faced issues with its scheduler, leading to problems like video and audio playback stuttering due to inefficient thread management.
  • The introduction of the Completely Fair Scheduler (CFS) in Kernel 2.6 in 2007 marked a significant improvement authored by Ingo Molnár, prioritizing CPU utilization and interactive program responsiveness.
  • Different operating systems implement varying scheduler strategies; historically, MacOS excelled in multimedia tasks due to superior thread management compared to Windows and Linux.

Concurrency Challenges and Solutions

The conversation shifts towards threading strategies across different operating systems, emphasizing advancements made by Mac OS X in the early 2000s and Linux's adoption of CFS post-2007.

Thread Management Across Operating Systems

  • Mac OS X led in threading strategies during the early 2000s, followed by Windows and later overtaken by Linux with CFS after 2007.
  • A pivotal paper from 1999 on handling concurrent connections (C10K problem) reshaped network concurrency discussions, setting the stage for modern scalability challenges like those faced by WhatsApp servers today.

I/O Handling Strategies

Exploring Input/Output (I/O) operations within computing systems reveals how system bottlenecks can arise despite parallel processing capabilities due to limited I/O bandwidth.

I/O Bottlenecks and Parallelism

  • I/O encompasses various system interactions such as file operations, networking, USB connectivity, etc., crucial for understanding system performance limitations.
  • While CPUs may support parallelism, a single bottlenecked I/O channel can hinder overall system efficiency akin to multiple individuals writing a book simultaneously but sharing a single pen.

Asynchronous I/O Implementation

Delving into asynchronous I/O mechanisms sheds light on optimizing resource utilization without blocking threads during data operations.

Enhancing System Efficiency with Asynchronous I/O

  • Asynchronous I/O allows threads to initiate data operations without halting processing flow; this event-driven approach enhances system responsiveness and efficiency.

New Section

In this section, the speaker discusses different approaches to handling new connections in a server process.

Handling New Connections

  • The second option involves serving each new connection with a new thread within the server process. Threads are lighter than processes but can lead to issues like managing shared memory and race conditions.
  • Managing thousands of connections with threads requires significant resources and efficient context switching mechanisms.
  • Apache and IIS initially used threads to handle multiple connections but faced limitations due to resource consumption.

New Section

This part introduces the concept of asynchronous I/O for handling multiple operations efficiently.

Asynchronous I/O Implementation

  • Asynchronous I/O allows a single thread to manage multiple I/O operations without blocking, improving system efficiency.
  • Different operating systems implement asynchronous I/O differently, such as kqueue in BSD-based systems, IOCP in Windows, and epoll in Linux.
  • Each implementation of asynchronous I/O behaves uniquely, showcasing the diversity among operating systems.

New Section

The discussion shifts towards the development of NGINX as a highly scalable web server solution.

NGINX Development

  • NGINX was created to handle tens of thousands of simultaneous connections efficiently by combining processes and asynchronous I/O calls.
  • NGINX's architecture includes a master process that spawns worker processes capable of managing numerous connection sockets concurrently.
  • By utilizing event loops and non-blocking I/O calls, NGINX can serve thousands of connections per worker thread simultaneously.

New Section

The importance and efficiency gains of asynchronous I/O are further elaborated through an analogy related to restaurant service.

Analogy: Restaurant Service

  • Traditional synchronous I/O processing leads to sequential delays akin to customers waiting for dishes one at a time from a chef.
  • Asynchronous I/O is likened to having a waiter take orders from all customers simultaneously, reducing wait times significantly.
  • The analogy highlights how non-blocking I/O prevents bottlenecks and improves overall system performance.

New Section

Details about NGINX's operational efficiency through multi-threading strategies are discussed here.

Operational Efficiency with Multi-Threading

  • NGINX optimizes resource usage by spawning one worker thread per CPU core, enabling it to handle millions of connections efficiently.
  • Each worker thread operates an event loop that waits for completion events on received requests without context switching overhead.

Concurrent Programming Concepts

In this section, the speaker discusses the concepts of concurrency and parallelism in computing, highlighting their distinctions and importance in system operations.

Understanding Concurrency and Parallelism

  • Concurrency refers to tasks that pause and allow others to execute in their place, common in computers until the 1990s.
  • Parallelism occurs when multiple cores in a CPU enable true simultaneous execution of concurrent tasks like threads.
  • Systems operate differently concerning concurrency, with varied thread implementations, schedulers, and asynchronous I/O methods.
  • Different problems require distinct solutions; processes forks, multi-threading, and asynchronous I/O are options within the toolkit for addressing concurrency challenges.
Video description

De volta à série Começando aos 40, estamos já no oitavo episódio! E ela é a Parte 3 do tema de Back-end, mas desta vez vou precisar me alongar mais explicando conceitos antes de retornar às ferramentas. Concorrência e Paralelismo é algo que todo iniciante hoje em dia já esbarra logo cedo. Nós vivemos num mundo que é naturalmente paralelo e concorrente. Já estamos no ponto onde nos definimos como "multi-tarefas". Mesmo assim, ainda existe mais superstição e "misticismo" do que real noção do que esse conceito realmente significa. E ao contrário do que possa parecer, na realidade o básico não é tão complicado assim. Preste bastante atenção na explicação de hoje, porque isso vai ser base pra todo o resto que vou explicar até o fim desta série. Se você já conhece os detalhes do que estou explicando, obviamente pra caber num vídeo para iniciantes, estou simplificando BASTANTE muitos conceitos pra ilustrar. Infelizmente não cabe tudo num episódio só, mas não deixem de comentar se achar que faltou alguma coisa muito importante! Links: * The Linux Scheduler: a Decade of Wasted Cores (https://blog.acolyer.org/2016/04/26/the-linux-scheduler-a-decade-of-wasted-cores/) * The C10K problem (http://www.kegel.com/c10k.html) * Transcript: https://www.akitaonrails.com/2019/03/13/akitando-43-concorrencia-e-paralelismo-parte-1-entendendo-back-end-para-iniciantes-parte-3 * Audio: https://anchor.fm/dashboard/episode/eava6l