How Binary Search Makes Computers Much, Much Faster

How Binary Search Makes Computers Much, Much Faster

How to Organize Information: Lessons from Libraries and Computers

The Evolution of Library Organization

  • There are various methods for organizing books, including by author, title, subject, or even spine color. Each method serves different needs based on context.
  • Melvil Dewey introduced a systematic approach in 1876 with his pamphlet on library organization, moving away from the chaotic system of sorting by purchase date.
  • Dewey's Decimal System assigned index numbers to subjects, allowing for easier access to information without needing librarian assistance and promoting browsing.
  • Despite its utility, Dewey's legacy is marred by his personal misconduct and biases that influenced the early classification system used in libraries today.
  • The key takeaway is that effective indexing allows readers to find books quickly—a principle that also applies to computer data management.

Searching Techniques: Linear vs. Binary

  • Computers utilize two primary search methods: linear search (checking each item sequentially) and binary search (dividing the list).
  • In a shuffled stack of cards numbered 1 to 50, finding a specific number like 13 requires a linear search through each card until found.
  • If the cards are sorted numerically, binary search can be employed—this method significantly reduces time complexity as it eliminates half of the remaining options with each step.
  • While binary search shows minimal advantage with small datasets (like eleven cards), it becomes exponentially faster with larger datasets (e.g., one million cards).
  • However, databases often contain multiple attributes (numbers, letters, colors), complicating searches since only one property can be optimized at a time.

Indexing in Databases

  • Just as librarians sort books by various criteria, databases can optimize searches using indexes but must choose which properties to index carefully.
  • A "clustered index" physically rearranges data for efficient access while a "non-clustered index" acts like an external reference guide for quick lookups without altering original data placement.
  • Using an index allows for rapid searches; if looking for all items matching certain criteria (like color), you can quickly retrieve relevant identifiers before accessing actual data.
  • This efficiency is crucial when dealing with large datasets where traditional searching would result in significant delays—transforming hours into seconds during queries.
  • Maintaining indices incurs overhead; every change in the database necessitates updates to these indices. Thus, careful consideration is needed regarding which fields warrant indexing.

Trade-offs in Data Management

  • There exists a trade-off between space and time when creating indices; not every attribute may justify the cost of indexing based on usage frequency.
  • Google’s extensive search index exemplifies this balance—boasting 100 petabytes while delivering customized results swiftly under one second.

Google's Commitment to User Experience

The Evolution of Google Search

  • Despite numerous changes and optimizations over the years, Google continues to display the time taken to answer queries on desktop search results, indicating their pride in this feature.
  • The introduction of knowledge boxes and advertisements has not diminished Google's focus on providing users with quick answers.
  • Google's ongoing commitment to user satisfaction is reflected in their efforts to optimize search results for relevance and speed.
  • The emphasis on response time showcases Google's dedication to enhancing user experience amidst evolving digital landscapes.
  • This persistence highlights a core value within Google: delivering timely information remains a priority even as they adapt to new technologies and user needs.
Video description

Featuring binary versus linear search, and non-clustered indexes. Uh, indices. However you want to say it. • MORE BASICS: https://www.youtube.com/playlist?list=PL96C35uN7xGLLeET0dOWaKHkAlPsrkcha Written with Sean Elliott https://twitter.com/SeanMElliott/ • Camera by Tomek • Graphics by William Marler https://wmad.co.uk 🟥 MORE FROM TOM: https://www.tomscott.com/ (you can find contact details and social links there too) 📰 WEEKLY NEWSLETTER with good stuff from the rest of the internet: https://www.tomscott.com/newsletter/ ❓ LATERAL, free weekly podcast: https://lateralcast.com/ https://youtube.com/lateralcast/ ➕ TOM SCOTT PLUS: https://youtube.com/tomscottplus 👥 THE TECHNICAL DIFFICULTIES: https://youtube.com/techdif