Real-Time AI Video is Finally Here (And It’s Insane!)

Real-Time AI Video is Finally Here (And It’s Insane!)

Pixar's Real-Time Narrative Controlled World Model

Introduction to the New Model

  • Pixar has released a real-time narrative controlled world model, which allows users to interact with video generation in near real-time. The experience is described as somewhat janky but fun, indicating potential for significant advancements in AI.

Overview of Current Developments

  • Black Forest Labs has also introduced Flux 2 Klein, alongside updates from Runway VO and an open-source AI super zoom tool. The focus remains on the evolving landscape of world models.

Exploration of World Models

  • Various approaches to world models are being explored, including Google's unreleased Genie 3 and Tencent's recent model. These models generate video frame by frame while maintaining some consistency within the virtual environment.

Features of PixFirst R1 Model

  • The PixFirst R1 model falls under real-time video generation, allowing continuous adaptation based on user prompts. This capability hints at future possibilities where viewers can alter narratives as they watch.

Technical Specifications

  • The model generates videos at 1080p resolution using an omni-native multimodal foundation that integrates text, image, video, and audio inputs. Currently, it only accepts text prompts but supports infinite streaming through an autoregressive mechanism.

Demonstration of Real-Time Video Generation

Initial Demonstration: Snow Peak Vlog

  • A demonstration begins with a character walking along a mountain path while music plays in the background. Users can prompt events like an avalanche occurring during this scene.

Interaction with the Model

  • When prompted for an avalanche effect, the generated result resembles a giant snowball rather than a realistic avalanche. This highlights both limitations and creative opportunities within the current technology.

Cinematic Realism Techniques

  • To maintain cinematic realism, users can prompt actions such as dancing; however, characters may revert to their original animations after performing these actions. This showcases how scripted interactions can be manipulated within the model.

Modes of Interaction: POV, Ambient & Dramatic

Different Modes Explained

  • The model features three interaction modes: POV (Point of View), Ambient, and Dramatic. Each mode offers unique experiences based on user prompts and scenarios presented in real-time video generation.

Examples from Each Mode

  • In ambient mode, users see a character sleeping on a couch until prompted for action (e.g., a cat jumping onto him). In dramatic mode, unexpected scenarios unfold rapidly—like characters interacting unexpectedly or engaging in unrelated activities—demonstrating unpredictability in narrative control.

A Hilarious Journey Through AI-Generated Scenarios

The Hot Dog Adventure

  • The narrative begins with characters consuming a hot dog, leading to the purchase of the world's largest hot dog, which adds a comedic element.
  • A tornado is introduced as a dramatic backdrop, but the model seems to forget about it while characters transition to a car scene.
  • Unexpected elements like masked hitmen and Godzilla appear, showcasing the unpredictable nature of AI-generated content.

Absurdity in AI Generation

  • A humorous scene unfolds where a man drinks coffee next to a fireplace while an unfazed bear appears; chaos ensues as the cabin catches fire.
  • Despite prompts for action, the character refuses to enter a bear cave, opting instead for exploration that leads to further absurd scenarios involving knights and castles.
  • The unpredictability of AI models is highlighted; they often do not follow directions accurately yet produce entertaining results reminiscent of early AI videos.

Evolution of AI Video Content

  • The speaker reflects on past notable AI videos like "Avalanche" by Zash Manson, emphasizing their humor and uniqueness.
  • Questions arise about future advancements in AI video generation (R3 or R4), prompting curiosity about ongoing projects in this field.

New Developments: Flux 2 Klein

  • Black Forest Labs introduces Flux 2 Klein, a smaller version of their model available in four configurations with varying VRAM requirements.
  • The model boasts impressive speed—30% faster than competitors—while remaining open-source for user accessibility.

Performance Insights

  • Initial tests reveal decent image generation capabilities despite some quirks like odd eye appearances in generated images.
  • While capable of producing interesting visuals, text generation remains problematic; users may need to make adjustments for client work.
  • Artistic styles emerge from fantastical prompts; however, results can lean towards painterly aesthetics rather than realistic depictions.

Image Editing and AI Innovations

Exploring New Image Transformations

  • The speaker discusses a delicious-looking toast image while mentioning their current low-carb diet, highlighting the appeal of the image.
  • A comparison is made between an animated character transformed into a real person by Brent Lynch and the Flux 2 Max, noting that the transformation looks better than expected despite some flaws in facial features.

Hardware Accessibility and Software Features

  • Klein 4B is noted for its affordability on consumer hardware, making it accessible to more users compared to other software like Flux 2.0 dev.
  • Runway's new feature called "story panels" allows users to generate cinematic stacks from images by describing a story, enhancing creative opportunities.

Creative Output Techniques

  • The technique used in Runway's story panels resembles methods showcased in previous videos, utilizing Nano Banana Pro for output generation.
  • The vertical three-up approach for outputs is praised as smart; however, manual cropping remains necessary, indicating room for improvement in automation.

Video Generation Insights

  • Heather Cooper demonstrates generating three scenes from one output using Runway’s new feature, showcasing efficiency in video generation.
  • Anticipation builds around version 4.5 of image-to-video technology; the speaker promises a comprehensive review upon release.

Updates on Google VO3.1

  • Google has improved VO3.1 with enhancements to ingredients-to-video features including native vertical outputs and upscaling options for 1080p and 4K resolutions.
  • Vertical video capabilities are highlighted as significant for audiences creating content like shorts or reels.

Open Source Project: Wonders Zoom

  • An open-source project named Wonders Zoom allows users to zoom into images effectively while adding elements to enhance visual storytelling.
  • The speaker reflects on how this tool reminds them of an older AI-generated piece by Chikai Ohama that utilized similar techniques but was initially created with different software.

This structured summary captures key insights from the transcript while providing timestamps for easy reference back to specific moments in the discussion.

Video description

Real-time video generation has arrived. Today, we are diving deep into PixVerse’s new R1 model—a narrative-controlled "world model" that lets you influence the video as it streams. From peaceful hikes to sudden avalanches and chaotic tornadoes, we push this real-time model to its absolute limits (and yes, it gets a little janky, but it is incredibly fun). We also break down the release of Black Forest Labs' Flux 2 Klein, a smaller, faster version of their massive image model designed for consumer hardware. Plus, we look at Runway's new "Story Panels" feature, updates to Google Veo, and a fascinating open-source project called Wonder Zoom. If you are interested in the future of generative media, world models, and open-source AI, you won't want to miss this one. My Newsletter: https://theoreticallymedia.beehiiv.com/ 🔗 Links Mentioned: PixVerse R1 Beta: https://x.com/PixVerse_ Black Forest Labs (Flux 2 Klein): https://bfl.ai/ Runway Story Panels: https://runwayml.com/ Google Veo Updates: https://blog.google/innovation-and-ai/technology/ai/veo-3-1-ingredients-to-video/?utm_source=futuretools.io&utm_medium=newspage Wonder Zoom Project: https://wonderzoom.github.io/ 00:00 - Intro: Real-Time World Models Are Here 00:36 - The Current State of World Models 01:20 - Explaining PixVerse R1: Interactive Video Generation 02:09 - R1 Technical Specs: 1080p & Infinite Streaming 02:44 - Testing the Snow Peak Vlog Preset 03:35 - Triggering an Avalanche (Or Giant Snowball?) 04:17 - Ambient Mode: Waking Up the Sleeping Guy 05:00 - Dramatic Mode: Business Suits & Hot Dogs 05:50 - Chaos Unfolds: Tornadoes, Hitmen & Godzilla 06:32 - Cozy Cabin Test: Fireplaces & Bears 07:14 - Janky and Weird is Just for Now 08:07 - Is This the Future of AI Entertainment? 08:36 - Flux 2 Klein Released: Smaller & Faster 08:53 - It's Fast! 09:23 - Why Flux Matters 09:33 - Testing Image Gen 10:12 - Handling Text & Fantasy Art in Flux 2 10:56 - The Toast Test & Realism Checks 11:07 - Image Editing Test 11:30 - Comparison: Flux 2 Dev vs. Klein 12:05 - Runway Update: Story Panels Feature 12:26 - Testing the Detective Prompt (Cinematic Stacks) 12:43 - Comparing Stacks to Previous Method 13:06 - Cropping 13:25 -Creative Examples: 3-Up Video Generation 13:36 Other Runway Updates 13:55 - Google Veo 3.1: Vertical Video Update 14:18 - "Ingredients" for Social Media Shorts 14:37 - Character Consistency 15:08 - Is the Flow Platform Worth the Price? 15:37 - Wonder Zoom: The Infinite Zoom Tool 16:00 - Parallax Effects & Legacy AI Comparisons 16:27 - Code Availability & Outro