AI is Already Building AI — Google DeepMind’s Mostafa Dehghani
The Future of AI: Recursive Self-Improvement
Introduction to AI Advancements
- The rapid evolution in AI models is largely driven by leveraging previous generations, with a focus on achieving full automation and long-term self-improvement.
- Matt Turk introduces Mustafa Dani, an influential AI researcher at Google DeepMind, highlighting his contributions to significant architectural advancements like universal transformers and the Gemini family.
Exploring Loops in AI
- The concept of "loops" is emerging as a critical area of research, where models improve not just by size but through recursive thinking.
- Dani explains that looping occurs at both micro (architecture/inference time) and macro (model development) levels, emphasizing self-improvement as a key focus.
Self-Improvement in Machine Learning
- Self-improvement represents the continuation of trends in machine learning, moving from manual feature engineering to data-driven approaches that minimize human bias.
- By removing human intervention from model improvement processes, researchers aim to eliminate bottlenecks and enhance model performance autonomously.
Detailed Mechanisms of Looping
- Dani discusses various methods for increasing test and compute capabilities within models, allowing them to refine their problem-solving processes iteratively.
- Techniques such as chain-of-thought reasoning and utilizing dummy tokens for verification are highlighted as effective strategies for enhancing model performance.
Recursive Self-Improvement: A New Frontier
- Recursive self-improvement (RSI), once considered science fiction, is becoming a reality; it involves models autonomously enhancing themselves over time.
- The discussion emphasizes that many advancements in RSI are already occurring without widespread recognition among the public or even within the research community.
AI Models and Self-Improvement
The Evolution of AI Models
- New generation models are heavily built on the foundations of previous generations, indicating a clear trajectory towards full automation in AI development.
- Concepts like continual learning are still evolving, but there is potential for models to autonomously improve themselves by calculating gradients and updating weights dynamically.
Future of Automation in AI
- Full automation could close the loop on self-improvement, shifting focus to providing sufficient computational resources for these models.
- The removal of human bottlenecks in model improvement is expected to lead to significant advancements in AI capabilities.
Examples of Recursive Learning
- The Kapathies auto research project serves as an early example of models effectively engaging in self-research and improving engineering processes.
- There’s evidence that intuition traditionally associated with expert researchers may be integrated into development loops through advanced models.
Implications for Research and Development
- While it raises questions about replacing genius researchers, the rapid progress seen suggests that such developments were previously underestimated.
- Current discussions emphasize AI's ability to update itself recursively, which could dramatically accelerate progress compared to traditional methods.
Challenges Ahead
- Despite optimism about future automation, several challenges remain unsolved; achieving full automation is complex but feasible.
- Evaluation remains a critical roadblock; without effective measures, it's difficult to assess improvements or define success criteria for self-improvement loops.
Philosophical Considerations in Evaluation
- Effective evaluation is essential for progress; teams can make significant strides when concrete evaluations exist but struggle without them.
- Defining appropriate evaluations poses philosophical challenges beyond technical hurdles, complicating the path toward measurable self-improvement.
Infrastructure Needs for Safe Operations
- Building reliable evaluation frameworks requires sophisticated infrastructure capable of safely running complex models within controlled environments.
- Ensuring that models operate correctly while performing tasks typically handled by human engineers presents additional safety concerns.
Understanding Model Efficiency and Verification
Challenges in Measuring Model Performance
- Measuring how much a model can push tasks and the duration of task execution is complex. Integrating various operational points into an efficient environment remains a significant challenge.
- Diversity in environments for models is identified as a bottleneck that hinders progress in developing more effective AI systems.
The Role of Formal Verification
- A discussion on formal verification highlights its potential as a powerful tool for ensuring continuous improvement loops within AI systems.
- While formal verification is beneficial, it faces limitations when applied to messier domains where clear proofs are difficult to establish, such as medical recommendations.
Feedback Loops and Real-world Applications
- There’s an emphasis on creating tight feedback loops using formal verification methods to address complexities in real-world applications.
- The conversation draws parallels between challenges faced in reinforcement learning and the difficulties encountered when moving away from mathematical frameworks.
Risks of Model Collapse
- Model collapse is recognized as a risk, particularly when models operate within closed loops without external signals or diverse inputs.
- A strong verifier or real reward signal can mitigate risks associated with model collapse by grounding the model's operations in reality.
Defining Model Collapse
- Model collapse occurs when models become overly specialized due to training on data generated by other models, leading to loss of generalization beyond specific tasks.
- This phenomenon raises concerns about self-reinforcing loops that may limit broader applicability and adaptability of AI systems.
Generalization vs. Specialization Debate
- The discussion explores the trade-off between generalization (broad knowledge across domains) and specialization (deep expertise in specific areas).
- Long-term goals involve developing models capable of knowing when to apply deep versus broad knowledge effectively, akin to an agentic actor making informed decisions based on context.
Pathways Toward AGI
- Achieving Artificial General Intelligence (AGI) requires balancing specialization for rapid learning with generalization for solving diverse problems.
- Specialized models serve as stepping stones toward more generalized capabilities, allowing focused development before expanding into broader applications.
Understanding Specialized Models Today
- The concept of specialized models is discussed; they may be distinct entities or variations of broader generalist models trained through specific methodologies like reinforcement learning.
Trade-offs in Model Training and Future of AI
Balancing Compute Constraints and Model Performance
- The speaker discusses the historical constraints in model training, emphasizing the need to allocate compute resources effectively to enhance model performance in specific dimensions.
- As compute becomes more accessible and affordable, other limitations such as data quality emerge, complicating the trade-offs between model specialization and generalization.
- Achieving a well-rounded model is challenging; focusing on multimodality can lead to regressions in other areas like coding or mathematical reasoning.
- Post-training often results in overfitting, where models are optimized for local optima that may not be universally applicable across all tasks.
- Decisions about model focus depend on organizational context and competition; some companies prioritize specialized models for specific tasks over generalist approaches.
Strategic Decision-Making in AI Development
- Short-term strategies favor specialization, allowing teams to concentrate efforts on maximizing performance within chosen parameters rather than spreading resources too thinly.
- This approach can simplify development processes by reducing complexity for researchers and engineers during initial phases of model creation.
- Organizational positioning influences these decisions; companies must adapt their strategies based on competitive landscapes.
Philosophical Considerations of AI Advancement
- A philosophical question arises regarding the automation of intellectual capabilities: what happens if top minds become automated?
- The speaker reflects on personal experiences with unpredictability in technological advancements, noting how timelines for breakthroughs often defy expectations.
- Concerns about future career guidance arise; predicting valuable fields of study becomes increasingly difficult amidst rapid changes in technology.
- Key skills for future relevance include strategic thinking and adaptability rather than deep expertise in narrow subjects due to evolving demands.
- The ability to synthesize information from various sources is highlighted as crucial for impactful decision-making moving forward.
AI and the Future of Data
The Role of Data in AI Development
- Discussion on whether data is still necessary if AI continues to evolve independently, raising questions about the future value of data versus compute power.
- Emphasizes that data encompasses more than just tokens; it includes any signal that models can utilize for learning, from simple predictions to complex interactions.
- Asserts that the importance of quality data will persist, shifting focus towards creating environments where models can effectively interact with real-world scenarios.
- Highlights the challenge of providing sensory information (e.g., smell) to models, suggesting a need for innovative ways to enhance model grounding in reality.
- Notes a trend towards "sensors as a service," indicating emerging startups focused on providing sensory data for AI applications.
Advances in Training Techniques
- Questions about future gains in AI training methods—whether they will come from post-training or pre-training techniques—and acknowledges an ongoing balance between both approaches.
- States that while pre-training remains foundational, current returns from post-training are significant and can lead to substantial improvements at lower costs.
- Shares personal experience with post-training initiatives like Gemini, noting how small innovations can dramatically enhance model performance.
- Discusses ongoing exciting developments in pre-training at GDM, emphasizing new ideas that could unlock further potential for downstream applications.
- Contrasts current views on pre-training against narratives suggesting its decline; asserts that fresh ideas continue to invigorate this area despite perceived diminishing returns.
The Evolution of Pre-training Strategies
- Reflects on how perceptions of pre-training have changed over time; acknowledges past methods may seem outdated but recognizes new strategies are emerging to revitalize this approach.
- Concludes by expressing optimism about upcoming advancements in Gemini 4 and the potential impact these changes could have on base model capabilities.
Understanding Continual Learning vs. Self-Improvement
Defining Continual Learning
- Continual learning is a hot topic in AI, focusing on how models can stay current with new information rather than just improving their capabilities over time.
- Self-improvement involves a model enhancing its own performance and intelligence, while continual learning emphasizes keeping knowledge up-to-date, akin to a doctor staying informed through ongoing research.
Key Differences Between Concepts
- The main challenge for both self-improvement and continual learning is the issue of "frozen weights," where a model's knowledge becomes outdated as new data emerges.
- Continual learning aims to ensure that models are updated with fresh knowledge without relying on external sources, making them more responsive to real-time changes.
Challenges in Implementation
- A significant hurdle in continual learning is "catastrophic forgetting," where acquiring new information leads to regression in previously learned knowledge.
- Current research indicates that effective continual learning methods have not yet been fully integrated into existing systems, highlighting the need for further exploration and development.
Research Landscape
- The field of continual learning is still evolving; researchers are experimenting with various ideas before settling on effective solutions for production use.
- There exists a dual phase in research: exploration (testing different concepts) followed by exploitation (refining successful ideas for practical application).
The Journey into AI and DeepMind
Background of the Speaker
- The speaker completed their PhD at the University of Amsterdam, specializing in machine learning focused on language models and search retrieval.
Early Career Experiences
- Internships at Google Brain during 2016 and 2017 sparked the speaker's passion for AI, particularly working on LSTMs for summarization tasks which were highly relevant at that time.
Transitioning to Advanced Projects
- An opportunity arose to work with a team developing transformer-based architectures, marking an exciting shift towards cutting-edge technology within AI.
Career Journey and Innovations in AI
Initial Hesitations and Life-Changing Decisions
- The speaker initially hesitated to join a team focused on transformer architecture, perceiving it as random and unpromising.
- Joining the team as an intern proved transformative, exposing the speaker to brilliant minds with a shared vision that inspired excitement and innovation.
Development of the Universal Transformer
- The Universal Transformer paper was co-authored in 2018, focusing on loops and recursion in model architecture; it faced initial rejection before acceptance in 2019.
- The core idea revolved around reusing parameters by allowing models to process outputs multiple times for improved performance.
Algorithmic Insights and Adaptive Computation
- The project utilized an algorithmic dataset from Lucash, revealing challenges with traditional transformers when handling long inputs.
- A significant breakthrough involved introducing adaptive computation mechanisms, inspired by previous works, which allowed for increased computational resources during testing.
Evolving Perspectives on Computational Resources
- Initially focused on reducing computational costs for simpler problems, the perspective shifted towards leveraging adaptive computation to tackle more complex issues effectively.
- This shift highlighted a new approach: increasing computational resources for challenging tasks rather than merely minimizing them.
Connections to Sparsity Concepts
- The discussion introduced concepts like negative sparsity related to mixture of experts, emphasizing efficient use of parameters without incurring additional computational costs.
Exploring Vision with Transformers
Introduction to Visual Transformers
- The speaker transitioned into vision research through collaboration with colleagues working in this area, leading to interest in multimodal applications.
Addressing Disparities Between Language and Vision Models
- A key observation was the disparity between large language models (400 billion parameters) compared to significantly smaller vision models at that time.
Exploring the Transition from Convolutional Neural Networks to Transformers
The Shift in Architecture
- The discussion begins with the realization that traditional convolutional neural networks (CNNs) may not scale effectively, prompting exploration into transformer architectures for scalability.
- There is a suggestion that if sufficient time and effort were invested in CNNs, they could also achieve scalability comparable to transformers, highlighting an ongoing debate within the machine learning community.
- Initial ideas included treating each pixel as a token; however, this approach led to high costs and lengthy contexts, necessitating further refinement of strategies.
Simplifying Complexity
- A breakthrough occurred when colleagues proposed a simpler method: dividing images into non-overlapping patches (e.g., 16x16 pixels), which could then be processed by transformers without complex overlapping windows.
- This straightforward "patchify" approach allowed for effective scaling and representation learning, surprising the team who initially sought more complicated solutions.
Bridging Modalities
- The successful application of transformer architecture to images marked a significant shift from previous models that separated CNNs for image processing and transformers for text processing.
- This innovation paved the way for multimodal models like Gemini 3, which can handle various data types such as images and videos using a unified architecture.
Advancements in Image AI
- The conversation transitions to Nano Banana's development, emphasizing its viral success and subsequent iterations like Nano Banana Pro and Gemini 3.1.
- Unlike traditional image generation methods that translate text prompts into visual instructions, Gemini operates natively across modalities—processing text and pixels simultaneously.
Insights on Multimodal Learning
- The speaker reflects on their initial unfamiliarity with image generation but expresses excitement about working with talented individuals in the field.
- A key interest lies in positive transfer across modalities—whether training on one type of data (like images) can enhance performance on another (like text), suggesting potential synergies between different forms of media.
Understanding Reporting Biases and Multimodal Learning
The Concept of Reporting Biases
- The speaker illustrates reporting biases using the example of visiting a friend's home with a uniquely shaped sofa, emphasizing that unusual experiences are more likely to be shared in conversation than mundane ones.
- This bias highlights how language tends to overlook average or typical occurrences, focusing instead on extremes or anomalies in experiences.
- The inefficiency of learning through language is discussed; visual inputs provide more direct knowledge acquisition compared to textual descriptions, particularly for complex concepts like gravity.
Multimodality in Learning Models
- The discussion transitions to the importance of incorporating multimodality into models, suggesting that combining different types of data (text and images) can enhance learning efficiency.
- The speaker mentions Gemini's development as a multimodal model from its inception, noting challenges faced during image generation improvements without compromising other capabilities.
Challenges and Insights in Model Training
- Acknowledgment is made regarding the difficulty in observing positive transfer effects when training models across modalities; achieving proficiency across various tasks remains challenging yet impressive.
- Collaboration with experts reveals nuances in visual quality assessment, underscoring the subjective nature of evaluating image quality and its impact on product success.
Advancements Beyond Traditional Image Generation
- The potential for evolving from simple text-to-image translation to creating models capable of "thinking" about images is introduced. This includes generating interleaved text and images for richer storytelling.
- Incremental generation is proposed as a solution for improving detail capture in generated images by allowing models to build complexity gradually rather than aiming for perfection on the first attempt.
Practical Applications and Future Directions
- Examples illustrate how incremental generation could enhance storytelling applications, such as children's books where text prompts guide image creation step-by-step.
- Emphasis is placed on planning within model generation processes, advocating for starting with larger elements before refining details to achieve better overall results.
Image Generation Efficiency and Insights
The Role of Object Size in Image Generation
- Discusses the strategy of using varying object sizes (small, medium, large) to optimize image generation, ensuring that the model's capabilities are effectively utilized.
- Introduces the concept of "interleaved generation," which offers a new perspective on image creation beyond simple text-to-image translation.
Speed and Efficiency in Image Creation
- Highlights the rapid image generation capabilities of models like Nano Banana, emphasizing their efficiency in producing high-quality images quickly.
- Mentions involvement in developing Nano Banana Pro and its latest version, focusing on improvements that enhance speed and efficiency through model size adjustments.
Model Optimization Techniques
- Explains efforts made to refine distillation recipes for knowledge transfer within models, making them lighter while maintaining performance.
- Acknowledges contributions from serving engineers who have significantly improved model serving speeds by innovative approaches.
Challenges in AI Understanding
- Asks about misconceptions within the AI field; emphasizes underestimating challenges related to jagged intelligence and how it complicates system understanding.
- Expresses concern over how current models struggle with basic tasks despite advanced capabilities, indicating deeper unresolved issues within AI systems.
Underrated Concepts in AI Research
- Identifies continual learning as an underrated area in AI research; suggests that current models are too static post-training and need more dynamic adaptation mechanisms.
- Argues for a shift away from treating foundation models as frozen entities towards more active development strategies that embrace continual learning.
Understanding the Role of RAG in AI
The Nature of RAG and In-Context Learning
- The speaker expresses uncertainty about whether RAG (Retrieval-Augmented Generation) will completely disappear, emphasizing its role in providing fresh information to models.
- Distinction is made between in-context learning (information within the model's context) and continuous learning, highlighting that both serve different purposes for integrating new information.
Overconfidence in Technical Solutions
- A critique is presented regarding people's overconfidence in technical advancements alone, suggesting that a smarter model does not guarantee overall progress.
- The speaker argues that significant issues such as governance, regulation, social trust, and equitable access must be addressed alongside technical improvements.
Challenges of Technological Progress
- It is noted that the pace of technological advancement often outstrips society's ability to adapt and manage these changes effectively.
- The need for a balanced approach to technology development is emphasized; both technical capabilities and societal mechanisms must evolve together.
Future Directions in AI Development
Starting from Scratch: Insights for Newcomers
- The speaker reflects on the challenges of starting anew in the field but identifies two areas worth exploring further.
Exciting Areas for Research
- Full automation of long-horizon tasks is highlighted as an exciting area; current agents are impressive but face reliability issues over extended operations.
Reliability Concerns with Long-Horizon Tasks
- A mathematical example illustrates how even high success rates per step can lead to low overall task completion probabilities due to compounding errors.
Building Trust through Reliability
- Emphasizes that users experience failures more acutely than successes; thus, ensuring reliability is crucial for building social trust in AI systems.
Philosophical Considerations in AI Development
Grounding AI Systems
- Discusses the importance of grounding AI systems in reality rather than relying solely on statistical patterns from data inputs.
Redefining Intelligence
- Suggestion made to reconsider definitions of intelligence as we develop increasingly complex systems without clear parameters or understanding.
Understanding Intelligence in AI Models
The Complexity of Defining Intelligence
- The discussion highlights the challenges in defining "intelligence," noting that it is a vague and complex concept, making it difficult to measure progress in AI development.
- There is an emphasis on the need for a systematic approach to define intelligence, suggesting that current benchmarks and scores are useful but insufficient for long-term goals.
- The speaker advocates for identifying clear targets and goals in AI research to facilitate focused progress towards achieving true intelligence.
- Acknowledgment of the engaging conversation, indicating a positive exchange of ideas about the future of intelligent models.
- The host expresses gratitude for the guest's participation, reinforcing the collaborative nature of discussions around AI advancements.