VIDEO HIGHLIGHT

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Sponsored by Brilliant 👉 To try everything Brilliant has to offer for free for a full 30 days, visit http://brilliant.org/veritasium. You’ll also get 20% off an annual premium subscription. How a feud in Russia led to modern prediction algorithms. If you’re looking for a molecular modeling kit, try Snatoms, a kit I invented where the atoms snap together magnetically - https://ve42.co/SnatomsV ▀▀▀ A huge thank you to Brian Hayes, David Aldous, Geoff Engelstein, Jeffrey Rosenthal, Jimmy He, Mark Priestley, Michael Choi, Peter Norvig, Sam Power, and Thomas Haigh for their invaluable expertise and contributions to this video. And a special thank you to Madeline Johnson for her excellent fact-checking. ▀▀▀ 0:00 The Law of Large Numbers 4:37 What is a Markov Chain? 9:43 Ulam and Solitaire 12:21 Nuclear Fission 15:46 The Monte Carlo Method 16:32 The first search engines 19:07 Google is born 25:16 How does predictive text work? 27:10 Are Markov chains memoryless? 29:41 How to perfectly shuffle a deck of cards ▀▀▀ References: https://ve42.co/RefsMarkov Images & Video: https://ve42.co/7Z ▀▀▀ Special thanks to our Patreon supporters: Adam Foreman, Albert Wenger, Alex Porter, Alexander Tamas, Anton Ragin, armedtoe, Balkrishna Heroor, Bertrand Serlet, Blake Byers, Bruce, Dave Kircher, David Johnston, David Tseng, Evgeny Skvortsov, Garrett Mueller, Gnare, gpoly, Ibby Hadeed, JT, Jeromy Johnson, Jon Jamison, Juan Benet, Keith England, KeyWestr, Kyi, Lee Redden, Meekay, Marinus Kuivenhoven, Matthias Wrobel, Meekay, meg noah, Michael Bush, Michael Krugman, Orlando Bassotto, Paul Peijzel, Richard Sundvall, Robert Oliveira, Sam Lutfi, TTST, Tj Steyn, Ubiquity Ventures, wolfee ▀▀▀ Writers: Kaela Albert, Casper Mebius & Derek Muller Producer & Director: Kaela Albert Editors: Peter Nelson & Jack Saxon Camera Operators: Gabe Strong, Gregor Cavlovic & Derek Muller Animator: Andrew neet, Emma Wright, Fabio Albertelli & Ivy Trello Illustrator: Caine Esperanzo, Jakub Misiek & Maria Gustanovich Assistant Editor: James Stuart Researchers: Darius Garewal, Gabe Strong & Emilia Gyles Thumbnail Designers: Ren Hurley & Ben Powell Production Team: Rob Beasley Spence, Casper Mebius, Sulli Yost and Henry Van Dyck Executive Producers: Derek Muller & Casper Mebius Additional video/photos supplied by Getty Images, Pond5 and Storyblocks Music from Epidemic Sound

Summary Transcript Chat

The Strange Math That Predicts (Almost) Anything

How a Math Feud in Russia Changed Our Understanding of Probability

The Context of the Feud

In 1905, socialist groups in Russia revolted against the Tsar, demanding political reform or his abdication. This division affected all societal aspects, including mathematics.

Mathematicians took sides: Pavel Nekrasov supported the Tsar and believed math could explain free will, while Andrey Markov, an atheist, criticized this view as unrigorous.

Key Figures and Their Beliefs

Pavel Nekrasov: Known as the "Tsar of Probability," he argued that mathematical independence was essential for understanding free will and used statistics to support his claims about social behaviors.

Andrey Markov: Opposed Nekrasov's views by asserting that probability could be applied to dependent events, challenging the long-held belief that independence was necessary for the law of large numbers.

The Law of Large Numbers

The law states that as more trials are conducted (like flipping a coin), outcomes converge towards expected values (e.g., 50/50 heads/tails). This principle was foundational in probability theory since Jacob Bernoulli's proof in 1713.

Nekrasov believed that if data followed this law, it implied underlying independent events—linking statistical patterns to acts of free will. He analyzed social statistics like marriage and crime rates to support his argument.

Markov's Counterargument

Markov sought to demonstrate that dependent events could also follow the law of large numbers by analyzing text patterns where letters depend on preceding ones (e.g., vowels following consonants).

Understanding Markov Chains and Their Implications

Transition Probabilities in Markov Chains

Markov established that starting from a vowel, the next letter could be either a vowel or consonant, with a 43% chance of being a vowel.

He calculated that vowel-vowel pairs occur about 6% of the time, leading to a transition probability of approximately 13% for moving from one vowel to another.

The remaining probability (87%) accounts for transitioning from a vowel to a consonant, ensuring all transitions sum to 100%.

By generating random numbers, he demonstrated how these probabilities dictate whether the next letter is a vowel or consonant, ultimately converging towards an expected ratio of vowels (43%) and consonants (57%).

This model illustrated that even dependent events can follow statistical laws like the law of large numbers, challenging notions of free will in decision-making.

Impact on Probability Theory

Markov's findings suggested that independence isn't necessary for probability calculations; his work allowed for modeling dependent events effectively.

Despite its significance, Markov's breakthrough went largely unnoticed at the time; he himself was more focused on pure analysis than practical applications.

Historical Context and Applications

The implications of Markov chains became evident during significant historical events such as the development of nuclear weapons in the Manhattan Project.

On July 16th, 1945, "The Gadget," the first nuclear bomb detonated by the U.S., showcased real-world applications where understanding complex systems was crucial.

Ulam's Insight into Randomness

Stanislaw Ulam sought to understand neutron behavior in nuclear bombs but faced challenges due to their complex interactions.

After suffering from encephalitis in January 1946, Ulam played Solitaire during recovery and pondered over winning probabilities in randomly shuffled games.

Simulation Techniques

Realizing analytical solutions were impractical due to vast possibilities (52 factorial arrangements), Ulam proposed simulating outcomes through repeated gameplay for statistical approximation.

Understanding the Monte Carlo Method in Nuclear Physics

The Basics of Neutron Behavior

Von Neumann identified the need for a Markov chain to model neutron behavior in nuclear reactions, starting with a neutron traveling through the core.

Three potential outcomes for the neutron: it can scatter off an atom, leave the system, or trigger fission by striking another uranium-235 atom.

Transition Probabilities and Their Variability

Transition probabilities are not fixed; they depend on factors like neutron position, velocity, energy, and uranium configuration.

For example, fast-moving neutrons have different scattering and absorption probabilities compared to slower ones.

Running Simulations on ENIAC

The first electronic computer, ENIAC, was used to simulate these chains by randomly generating initial conditions for neutrons.

The average number of neutrons produced per run is referred to as the multiplication factor (k), which indicates whether a reaction will die down or grow exponentially.

Birth of the Monte Carlo Method

Ulam and von Neumann developed a statistical method that approximated complex differential equations without exact calculations.

Named after Monte Carlo Casino due to its reliance on random sampling; this method quickly spread among scientists studying nuclear reactor designs.

The Rise of Yahoo Amidst Internet Expansion

Early Days of Search Engines

In 1993, with the internet opening up to the public, search engines faced challenges due to an overwhelming amount of information.

Jerry Yang and David Filo founded Yahoo in 1994 but needed funding for their startup.

Investment from Masayoshi Son

Japanese billionaire Masayoshi Son offered $100 million instead of the requested $5 million during their meeting.

Son recognized that leading search engines lacked technological advantages; success would depend on user attraction and marketing budgets.

Yahoo's Growth Trajectory

With Son's investment accepted out of necessity, Yahoo rapidly became one of the most popular websites globally within four years.

Challenges Faced by Yahoo

Despite its popularity, Yahoo struggled with keyword manipulation tactics that allowed users to artificially inflate page rankings.

The Evolution of Search Engines and PageRank

The Concept of Endorsements in Web Links

The idea of using stamps on library cards as endorsements for book quality parallels how links function on the web, where more links suggest higher value.

Sergey Brin and Larry Page recognized that each link to a webpage acts as an endorsement, leading to the development of a model for web ranking based on these endorsements.

Understanding Markov Chains in Web Navigation

A simplified model with four webpages (Amy, Ben, Chris, Dan) illustrates how transitions between pages can be represented through probabilities.

By simulating a surfer navigating this toy internet, one can determine the relative importance of pages based on time spent on each.

Quality vs. Quantity in Link Building

Creating numerous low-quality links may initially boost rankings but ultimately fails due to lack of genuine connections from other reputable sites.

To address disconnected pages in networks, a damping factor is introduced: 85% normal linking and 15% random jumps ensure comprehensive exploration without getting stuck.

Introduction to PageRank Algorithm

PageRank was developed by Brin and Page as a superior search engine algorithm that improved search result accuracy significantly.

Some critics argued against immediate results from searches fearing loss of ad opportunities; however, Brin and Page believed better search would attract users regardless.

Launching Google: From BackRub to Google

In 1998, Brin and Page launched their search engine initially named BackRub before rebranding it to Google after realizing its potential impact.

The name "Google" stemmed from the mathematical term "googol," representing their ambition to index vast amounts of information online.

Google's Dominance and Algorithmic Impact

Over four years, Google surpassed Yahoo as the leading search engine due to its focused approach compared to competitors like Microsoft’s Bing.

Changes in Google's algorithms can have significant effects across the internet landscape; their focus has led them to become synonymous with online searching.

Claude Shannon's Contribution to Information Theory

Claude Shannon expanded upon Markov chains by exploring text prediction using letters and words instead of just vowels or consonants.

Understanding Markov Chains and Their Applications in Language Models

The Basics of Prediction with Markov Chains

Shannon observed that sequences of four words often made sense, leading to better predictions about the next word by considering more previous words.

This concept is similar to how Gmail predicts user typing, utilizing algorithms based on Markov chains which operate on tokens (letters, words, punctuation).

Advanced Features of Language Models

Unlike simple Markov chains, modern language models employ an attention mechanism that helps determine what context to focus on for accurate predictions. For example, "the structure of the cell" can be interpreted correctly using prior context like "blood" or "mitochondria."

Concerns About Feedback Loops in Language Models

A significant concern arises as text generated by language models becomes training data for future models, potentially leading to repetitive outputs and a dull state where the model produces the same content indefinitely.

Systems with feedback loops become challenging to model using traditional methods like Markov chains; global warming serves as an example where increased CO2 leads to higher temperatures and more water vapor, creating a complex predictive challenge.

Memoryless Property of Markov Chains

Many systems exhibit long histories but can be simplified through the memoryless property of Markov chains, allowing predictions based solely on current states rather than historical data. This simplification is powerful for complex systems.

As noted in literature, effective problem-solving often involves constructing appropriate Markov chains. This mathematical insight emerged from unexpected circumstances surrounding its development.

Card Shuffling and Randomness

The discussion shifts to card shuffling as a practical application of Markov chains: it takes seven riffle shuffles for a 52-card deck to achieve randomness effectively. Each shuffle represents a step within this chain.

Alternative shuffling methods require significantly more repetitions (over 2,000) to reach randomness compared to riffle shuffles—highlighting the complexity behind achieving true random arrangements in practice.

Conclusion and Learning Opportunities

Understanding these concepts not only clarifies how language models function but also illustrates how seemingly simple questions can lead into intricate mathematical discussions.