Name: ¿Qué es Hadoop?
Uploaded: 2019-12-19T10:00:17.000Z
Duration: 18 min 38 s
Description: Hadoop es la tecnología básica del mundo Big Data. Para empezar en el mundo Big Data, hay que entender qué es Hadoop. En este vídeo te contamos qué es, cómo surgió y porqué muchas empresas lo están utilizando para construir las nuevas plataformas Big Data que les permiten procesar datos con un volumen y una velocidad inalcanzables hasta ahora para las tecnologías tradicionales

¿Qué es Hadoop?

Understanding Hadoop and its Origins

In this section, we will explore the origins of Hadoop, who created it, and why it became a significant part of current life systems. We will also discuss how Hadoop coexists with other options in today's world.

The Creation of Hadoop

Hadoop was created by Doug Cutting and Mike Cafarella in 2002 while they were working on developing a web search engine called "Lucene."

They needed to download and process a large number of web pages to index them for their search engine.

Initially, they considered downloading all the pages and storing them on multiple hard drives, but this approach proved to be impractical due to the enormous volume of data involved.

Challenges Faced

To process all the downloaded web pages on a single machine would take an estimated 12 years.

They needed a solution that allowed for distributed processing across multiple machines.

Inspiration from Google

The creators of Hadoop found inspiration from Google's own challenges in creating their famous search engine.

Google had already developed their own distributed file system called the Google File System (GFS) and a distributed processing framework called MapReduce.

These two ideas served as the foundation for solving the challenges faced by Cutting and Cafarella.

Distributed Processing with Multiple Machines

In this section, we will explore how using multiple machines for distributed processing can significantly reduce the time required compared to using a single machine.

Architecture for Distributed Processing

By distributing the workload across multiple machines, tasks that would take over 12 years on a single machine could be completed in weeks with hundreds of machines.

A typical architecture consists of one master computer coordinating the distribution of work among slave computers performing various tasks.

Data and Processing Distribution

Two key problems needed to be solved: distributing the data and distributing the processing.

The creators of Hadoop realized that Google had already encountered and solved these same problems with their distributed file system (GFS) and MapReduce framework.

Birth of Hadoop

In this section, we will discuss the birth of Hadoop as a result of combining the ideas from Google's GFS and MapReduce with the challenges faced by Doug Cutting and Mike Cafarella.

Key Components of Hadoop

HDFS (Hadoop Distributed File System): A distributed file system that allows for data distribution across multiple machines.

MapReduce: A framework for distributed processing that enables work distribution among machines.

Conclusion

In 2006, Hadoop was born as a solution to the challenges faced by Cutting and Cafarella in processing large volumes of web pages for their search engine.

With its distributed file system and processing framework, Hadoop revolutionized big data processing.

New Section

This section discusses the history and popularity of Hadoop, an open-source software framework.

History of Hadoop

In 2002, the initial efforts for Hadoop began within the Nutch project.

In 2003 and 2004, Google published details about its distributed processing systems.

In 2006, Hadoop was born as a project derived from Nutch.

Yahoo started using Hadoop as its project Lucy.

In 2008, Hadoop became part of the Apache Foundation and started gaining popularity.

Facebook also started using Hadoop in 2008.

Timestamps are not available for this section.