PaperBanana - Is this the BEST Agentic Framework for Generating COMPLEX Images?

Name: PaperBanana - Is this the BEST Agentic Framework for Generating COMPLEX Images?
Uploaded: 2026-02-08T00:30:00.000Z
Duration: 20 min 50 s

Introduction to Paper Banana

Overview of the New Tool

A new paper titled "Paper Banana" from a collaboration between Google and Peking University generates publication-quality diagrams from plain text descriptions.

Unlike existing tools like Nano Banana Pro, which generate images in one step, Paper Banana employs a multi-agent system for enhanced output quality.

The Multi-Agent Approach

Paper Banana utilizes five AI agents: one for retrieving reference examples, another for planning layout, a third for applying design guidelines, a fourth for generating images, and the last for critiquing results.

This process creates a "generate, critique, refine" loop that improves the final output significantly compared to single-step models.

Demonstration of Output Quality

Visual Output Analysis

The speaker showcases an example generated by Paper Banana, highlighting its detailed architecture representation through color coding and clear directional arrows.

The diagram effectively explains complex connections within the architecture while maintaining clarity in annotations and data flow.

Disclaimer on Implementation

The demonstration used an unofficial community-built open-source implementation rather than the official code from Google and Peking University.

The official code is expected to be released soon; viewers are encouraged to express interest in further videos once it becomes available.

Setting Up Paper Banana

Cloning and Configuration

Users can clone the community-based GitHub repository of Paper Banana and set up their API keys to start using it.

Key components include various agents such as visualizer, stylist, retriever, planner, critic, and base agent working collaboratively under Gemini 3 Banana Pro model.

Running Commands

Users are guided on how to run commands with sample inputs to create diagrams using Paper Banana's transformer model.

After executing initial commands with sample input, users can provide detailed prompts to see how outputs vary based on input complexity.

Iterative Generation Process

Diagram Creation Workflow

The tool retrieves relevant references before executing detailed planning exercises followed by iterative refinements across three generations.

Each iteration produces different versions of diagrams showcasing improvements until reaching a final output that self-corrected based on previous iterations.

Conclusion on Effectiveness

This iterative approach demonstrates how multiple feedback loops enhance diagram generation quality beyond what traditional image generation models can achieve.

Creating a Sophisticated Agent Diagram with Google's ADK

Overview of the Task

The speaker initiates a new test by prompting an AI to create a text file named ADK agent.ext, detailing an agent system built on Google's ADK kit.

The task involves generating a sophisticated diagram for Google’s Agent Development Kit (ADK), indicating the speaker's familiarity with the framework.

Understanding Generated Outputs

After providing input, the AI generates outputs in multiple iterations, showcasing its ability to create complex structures based on user prompts.

The final output includes various agents: an orchestrator agent, research agent, data analysis agent leveraging Bitquery, and a report generator.

Key Features of the Generated Diagram

The generated diagram illustrates a multi-agent architecture using Google ADK, highlighting components like persistent memory storage and structured business reports.

It emphasizes real-time information access and mentions tools such as Pandas for better data structuring.

Significance of Results

The speaker notes that the AI's output could be suitable for inclusion in academic papers, demonstrating high-quality results from detailed prompts.

Performance Metrics from Paper Banana

Discussing performance metrics, Vanilla Nano Banana Pro scored 43.2 overall while Paper Banana achieved 60.2 across four dimensions: faithfulness, conciseness, readability, and aesthetics.

Notably, Paper Banana outperformed human-drawn diagrams in conciseness, readability, and aesthetics; however, humans still excelled in faithfulness due to precise intent understanding.

Implications for Future Workflows

This development represents advanced "agentic AI," where specialized agents collaborate creatively—an approach expected to shape future AI systems beyond 2026.

The potential applications extend beyond researchers to professionals like solution architects and product managers who need effective visual representations of complex ideas.

Upcoming Developments

The official code release is anticipated soon following initial announcements; viewers are encouraged to share use cases for testing once available.