Gemini Deep Think

Gemini Deep Think

Google DeepMind Achieves Gold Standard at the International Math Olympiad

Overview of the Achievement

  • Google DeepMind recently achieved a gold medal standard at the International Math Olympiad (IMO), marking a significant milestone in AI development.
  • This achievement follows OpenAI's similar success, although their outputs were not as strong.

What is the International Mathematical Olympiad?

  • The IMO is a prestigious competition for high school students, established in 1959, focusing on challenging mathematical problems across various fields like algebra and geometry.
  • Previous attempts by Google to compete included scoring a silver medal last year, but comparisons were complicated due to longer processing times for AI models compared to human participants.

Last Year's Results and Methodology

  • In last year's competition, Google scored 28 out of 42 points, just shy of the gold medal threshold of 29 points. Each team answers six questions worth seven points each.
  • The approach last year involved systems like Alpha Proof and Alpha Geometry; however, this year’s method was notably different. Instead of using specialized mathematics languages, they utilized an advanced version of Gemini with Deep Think technology.

Introduction to Gemini and Deep Think

  • Deep Think was announced at Google I/O earlier this year and has shown improved performance on specific problem types related to mathematics and logic through testing over several months.
  • The model's suitability for IMO tasks stems from its enhanced capabilities in handling complex mathematical queries effectively.

Competition Context and Controversy

  • During the ICML conference in Vancouver, both OpenAI and Google presented their results regarding the IMO competition outcomes amidst some controversy about announcements being delayed until after human competitors were recognized.
  • OpenAI claimed gold-level performance with an experimental reasoning LLM that scored 35 out of 42 points while releasing proofs for five questions answered correctly; rumors suggested that Google's results might also include a gold medal but were withheld temporarily for proper acknowledgment of human achievements first.

Public Release of Solutions

  • Both companies made their solutions public but only released proofs for five questions they answered correctly, raising speculation about potentially embarrassing responses to other questions not disclosed by either party.
  • Observers noted differences in how each model approached problem-solving; some believed DeepMind's solutions appeared more human-like compared to OpenAI's methods.

Exploring Gemini Deep Think Model

Deep Think: Understanding Parallel Thinking in AI

How Deep Think Operates

  • Deep Think generates multiple chains of thought simultaneously, referred to as "parallel thinking time," allowing it to explore various possibilities.
  • The model evaluates these thoughts to determine which are most useful and which should be discarded, although the exact mechanisms remain unclear.
  • This process contributes to its high performance on benchmarks, particularly in reasoning and knowledge tasks.

Performance Insights

  • Initial generation times can be lengthy; for instance, it may take up to 10 minutes or longer before producing any output.
  • Users often experience delays without visibility into the model's progress during this parallel processing phase.
  • While Deep Think is capable of intelligent outputs, practical applications may be limited due to these long wait times.

Example Outputs and Timings

  • After approximately 6 minutes and 15 seconds, initial "thinking tokens" begin appearing, but a final answer takes much longer.
  • A complete response was generated after about 16 minutes, confirming possible values for K as zero, one, or three.

Variability in Response Times

  • Although some problems may yield answers in under 10 minutes, complex queries typically require around 10 minutes for resolution.

Case Study: Math Problem from AIME Dataset

  • An example problem took about 13 minutes of processing time before arriving at the correct answer of 204 through summarization techniques.

User Experience with Long Wait Times

  • Users are advised to manage expectations regarding wait times by submitting prompts and engaging in other activities while waiting for results.

Exploring Non-Math Related Tasks with Deep Think

Prompting Creative Outputs

  • A prompt requesting a design for a Sala Thai resulted in a simplified response that lacked depth compared to what might have been generated internally.

Successful Implementation of Ideas

  • Despite initial simplifications, the model eventually produced usable code using Three.js that successfully rendered a basic version of a Sala Thai structure.

Game Development Insights

Game Development Insights with AI

Initial Game Generation

  • The AI generated a game that lacked obstacles, making it difficult for the ball to interact effectively within the environment.
  • After identifying issues with the game's mechanics, such as insufficient ball travel distance and lack of structures, the AI was prompted to enhance its design.

Improvements and Gameplay Mechanics

  • The updated version of the game introduced better physics for the ball, allowing for scoring when hitting targets like pigs, although some elements like stone and wood interactions were still underdeveloped.
  • The process involved multiple prompts which resulted in longer wait times compared to previous models like Gemini 2.5 Pro.

Model Evaluation and Recommendations

  • There are concerns about using this model for coding due to its slower performance; it may be more effective for initial architecture setup rather than full development.
  • Users interested in trying out Deep Think can access it through an ultra subscription on the Gemini app or expect availability via API soon.

Challenges in AI Development

  • While advancements in intelligence tasks are notable, there is a cautionary note regarding balancing speed, cost, and intelligence when utilizing these models.
Video description

In this video, we look at the latest Gemini release, Gemini DeepThink, and see what it can be used for and how it was able to reach gold medal standard in the International Math Olympiad. Blog: https://blog.google/products/gemini/gemini-2-5-deep-think/ For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:02 Gemini with Deep Think Blog 05:45 Demo: Math Olympiad Question 10:35 Demo: AIME 2025 Dataset Math Problem 11:38 Demo: 3D Voxels 12:51 Demo: Game Programming