Googles AI Boss Reveals What AI In 2026 Looks Like
Future of AI: Insights from Demis' Interview
Overview of AI Developments by 2026
- Demis discusses his vision for the future of AI, particularly focusing on the emergence of full omnimodels by 2026.
- He emphasizes the convergence of modalities, highlighting Google's Gemini as a foundational multimodal model capable of processing images, video, text, and audio.
- The advancements in multimodality are exemplified by Googleโs latest image model, Nano Banana Pro, which demonstrates impressive visual understanding and infographic creation capabilities.
Key Components of Full Omnimodel Stack
- The full omnimodel stack consists of six components: robotics, images, video, audio, 3D, and text.
- Google is rapidly advancing in these areas; however, it is noted that they are slightly behind in robotics compared to other frontier AI companies.
Innovations in Robotics with Gemini Robotics 1.5
- Introduction of Gemini Robotics 1.5 aims to enhance physical agents' capabilities to solve complex multi-step challenges effectively.
- Demonstrations show robots can perform tasks like sorting fruits or laundry through step-by-step reasoning and environmental perception.
- The new model allows for uniformity across different robot types without needing fine-tuning for various form factors.
Advanced Capabilities and Applications
- Gemini Robotics 1.5 can utilize internet resources to answer questions and complete tasks based on specific guidelines (e.g., waste sorting).
- This advancement signifies a move towards creating genuinely useful AI agents that can assist in everyday physical tasks.
Multimodal Integration: Images and Videos
- The integration between image generation (Nano Banana Pro) and potential applications in video showcases how models reason like agents during content creation.
- Future developments may extend this reasoning capability into video production as well.
V3 Video Technology
- V3 is highlighted as a leading technology for image-to-video conversion with expectations for significant improvements by 2026.
Underappreciated Features: Gemini Live
- Discussion includes Gemini Live as an underrated feature combining live speech recognition with real-time reasoning capabilities to assist users effectively.
AI-Assisted Oil Change Demonstration
Overview of AI in Practical Applications
- The demonstration showcases an individual using Gemini Live to perform an oil change on a 2009 BMW 335i, highlighting the practical application of AI in everyday tasks.
- The speaker speculates about advancements by 2026, predicting improvements in latency and reasoning capabilities of AI systems, enhancing their utility for complex tasks.
Step-by-Step Oil Change Process with AI Guidance
- Gemini assists the user by asking about the type of oil and tools available, demonstrating interactive communication between AI and user.
- The AI provides specific instructions on locating the oil filter and drain plug, emphasizing its role as a guide through technical processes.
- Instructions include removing a plastic panel to access the drain plug, showcasing how AI can simplify mechanical tasks for users.
Technical Specifications and User Interaction
- The conversation includes torque specifications for reassembling parts (18 ft-lb), illustrating how detailed guidance is provided by the AI during each step.
- Users are prompted to replace O-rings on the oil filter cap before installation, indicating thoroughness in maintenance procedures facilitated by AI.
Completion and Future Implications
- After completing the oil change process, Gemini confirms that everything is set, reflecting successful interaction between human and machine.
- The speaker expresses optimism about future developments in AI technology by 2026, suggesting significant enhancements in user experience and task execution.
Exploring World Models: A New Frontier
Introduction to World Models Concept
- Discussion shifts towards "world models," which are anticipated to be a major theme in technological advancements by 2026.
Genie3: Interactive Video Model
- Introduction of Genie3 as an innovative system allowing users to generate interactive video environments that respond dynamically to user actions.
Real-Time Interactivity Features
- Genie3 enables real-time interactivity where environments react live based on user movements rather than relying on pre-built simulations.
- The concept of "world memory" is introduced; actions taken within these environments persist over time, enhancing immersion.
Dynamic Content Creation
- Users can create new events within their generated worlds spontaneously (promptable events), showcasing flexibility and creativity afforded by advanced world models.
Genie 3: Revolutionizing World Simulation
Exploring Real-World Physics and Unique Environments
- Genie allows users to explore real-world physics, movement, and various unique environments, including distinct geographies and fictional settings.
- The application of Genie 3 extends to next-generation gaming and entertainment, as well as research in embodied training for robotic agents.
Advancements in World Models
- The transition from Genie 2 to Genie 3 showcases significant advancements in world models that could lead to enhanced simulations powered by cross-modality.
- These world models are characterized by their ability to simulate virtual worlds with memory and reasoning capabilities, which were previously unimaginable.
Agent-Based Systems and Their Potential
- Google is leading the development of agent-based systems that aim to perform full tasks reliably; however, they still face challenges in reliability.
- Gemini 2.0 O powers a multi-agent system designed as a virtual scientific collaborator that assists researchers in generating novel testable hypotheses.
Future of AI Agents in Scientific Research
- The potential for AI agents to propose new ideas and conduct experiments marks a transformative shift in scientific research methodologies.
- Google's code men agent focuses on detecting and fixing security vulnerabilities within codebases, showcasing the versatility of AI applications across different fields.
Comprehensive Data Science Automation
- Google's data science agent automates end-to-end data science workflows within Google Collab, highlighting the extensive range of tools being developed under their roadmap.
- Alpha Evolve represents another innovative step forward as an AI scientist focused on algorithmic discovery, indicating the future trajectory of AI's role in scientific exploration.