Google and Peking University unveil PaperBanana, a multi-agent AI system that automates publication-ready diagrams and accurate plots.
Creating publication-quality diagrams and statistical visualizations has long been one of the most time-consuming aspects of academic research. While AI systems have advanced in conducting literature reviews and generating code, they often fall short when it comes to visually communicating complex methodologies and findings.
To address this gap, researchers from Google and Peking University have introduced PaperBanana — an innovative agentic framework designed to automatically generate high-quality academic diagrams and statistically precise plots.
A Multi-Agent Architecture Built for Precision

Unlike traditional single-prompt systems, PaperBanana operates through a coordinated team of five specialized AI agents. Together, they convert raw technical text into polished, publication-ready visuals.
Phase 1: Linear Planning
The first phase establishes structure, style, and visual intent.
- Retriever Agent
Identifies the ten most relevant reference examples from a curated database to guide layout and stylistic consistency. - Planner Agent
Converts dense technical descriptions into a structured textual blueprint of the intended figure. - Stylist Agent
Ensures the visual output aligns with established academic aesthetics, particularly the widely recognized “NeurIPS look,” using curated color palettes and layout standards.
Phase 2: Iterative Refinement
Once the plan is defined, PaperBanana enters a refinement loop.
- Visualizer Agent
Generates the visual output. For methodology diagrams, it leverages advanced image models such as Nano-Banana-Pro. For statistical plots, it writes executable Python code using Matplotlib instead of relying solely on image generation. - Critic Agent
Reviews the generated visual against the source content to detect factual inconsistencies, visual distortions, or design flaws. The system performs up to three refinement cycles to improve clarity and accuracy.
This structured workflow ensures both aesthetic quality and technical correctness.
Surpassing the NeurIPS 2025 Benchmark
To rigorously evaluate the framework, researchers introduced PaperBananaBench, a dataset consisting of 292 real-world test cases drawn from NeurIPS 2025 publications. Using a Vision-Language Model (VLM) as an automated judge, PaperBanana was compared against leading baseline systems.
Performance Improvements Over Baseline:
- Overall Score: +17.0%
- Conciseness: +37.2%
- Readability: +12.9%
- Aesthetics: +6.6%
- Faithfulness: +2.8%
The framework demonstrated exceptional strength in “Agent & Reasoning” diagrams, achieving a 69.9% overall score. It also integrates automated aesthetic guidance that favors modern “Soft Tech Pastels” rather than harsh primary color schemes.
Statistical Plots: Image Generation vs. Code Execution
One of the most notable innovations in PaperBanana is its hybrid approach to statistical visualization.
Traditional AI image generators often produce visually attractive plots but suffer from numerical inaccuracies or repeated graphical elements — a phenomenon known as “numerical hallucination.”
PaperBanana addresses this by switching to executable code when precision is required.
Image-Based Plot Generation
- Strong visual appeal
- Susceptible to numerical distortions
- May struggle with complex datasets
Code-Based Plot Generation (Matplotlib)
- 100% data fidelity
- Accurate representation of raw inputs
- Handles dense and multi-series datasets reliably
- Produces a standard, professional academic look
By combining aesthetic flexibility with coding precision, the framework ensures both visual appeal and scientific integrity.
Domain-Specific Visual Intelligence
PaperBanana recognizes that visual design expectations differ across research domains. The framework adapts stylistic elements accordingly:
- Agent & Reasoning
Illustrative, narrative-driven visuals using 2D vector characters, UI-style elements, chat bubbles, and friendly avatars. - Computer Vision & 3D
Spatially dense and geometric designs featuring camera frustums, ray lines, point clouds, and RGB axis encoding. - Generative & Learning Systems
Modular and flow-oriented layouts using tensor cuboids, matrix grids, and pastel-filled logic zones. - Theory & Optimization
Minimalist, textbook-inspired graphics with graph nodes, manifolds, grayscale palettes, and subtle highlight accents.
This contextual awareness allows PaperBanana to align outputs with community norms and conference standards.
Why PaperBanana Matters
PaperBanana represents a significant step forward in AI-assisted research workflows. Its impact lies in three core innovations:
- Collaborative Multi-Agent Design
Five specialized agents work together to translate technical content into refined academic visuals. - Structured Planning and Refinement
A two-phase workflow ensures stylistic consistency, factual accuracy, and iterative improvement. - Precision-Driven Statistical Rendering
By switching from image generation to executable Python code for statistical plots, the framework eliminates numerical errors and enhances trustworthiness.
As AI continues to transform scientific publishing, PaperBanana demonstrates how agent-based systems can move beyond text generation — helping researchers communicate discoveries with clarity, accuracy, and visual sophistication.
Read more: Logitech MX Master 3S Returns to Record-Low Price – A Smart Upgrade for Your Office Setup







