Google DeepMind has unveiled two groundbreaking AI models, Gemini Robotics and Gemini Robotics-ER, marking a significant advancement in the field of robotics. These models are designed to enhance robotic intelligence and physical interaction, setting the stage for a future where robots can perform complex, real-world tasks with human-like dexterity.
A Leap Forward in Robotics
Building on the foundation of Gemini 2.0, Gemini Robotics-ER achieves a two to three times higher success rate in end-to-end settings. Google’s ultimate goal is to create AI that can operate seamlessly across different robotic platforms, regardless of shape or size. As Kanishka Rao, a robotics researcher at Google DeepMind, said, “We’ve been able to bring the world’s understanding—the general-concept understanding—of Gemini 2.0 to robotics.”
Key Features of Gemini Robotics
- Vision-Language-Action Model: Gemini Robotics enables robots to understand and respond to new situations without specific training. Tasks such as folding paper or unscrewing a bottle cap can be executed with precision. According to Rao, “Once the robot model has general-concept understanding, it becomes much more general and useful.”
- Enhanced Spatial Understanding: Gemini Robotics-ER leverages embodied reasoning to improve 2D and 3D object detection and grasping abilities. Carolina Parada, who leads Google’s robotic work, emphasized, “We’re building this technology and these capabilities responsibly and with safety top of mind.”
- Integration with Low-Level Controllers: This feature allows roboticists to enhance their models and perform complex tasks like safely handling objects. “We trained the model primarily on data from the bi-arm robotic platform, ALOHA 2, but we also demonstrated that it could control a bi-arm platform, based on the Franka arms used in many academic labs,” DeepMind researchers noted.
- Real-World Application: The models have been tested on the bi-arm robotic platform ALOHA 2 and Franka arms, widely used in academic research. When shown a coffee mug, for example, Gemini Robotics-ER can determine an appropriate two-finger grasp and plan a safe approach trajectory.
Collaboration and Future Prospects
Google DeepMind has partnered with leading robotics companies like Apptronik, Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools. These collaborations aim to advance humanoid robot development and expand the technology’s application in various industries. In a post on X, Google announced, “We’re excited to partner with robotics leaders to push the boundaries of intelligent robotics and explore new possibilities.”
Addressing Safety Concerns
To mitigate potential risks associated with AI-powered robots, Google DeepMind introduced ASIMOV, a benchmark for assessing safety and ethical compliance. This system identifies potentially dangerous behaviors and establishes guardrails to ensure responsible AI usage. Parada highlighted, “It may take years for robots to learn to become significantly more capable. Unlike humans, robots using the Gemini Robotics models do not learn as they do things.”
As a result
With the launch of Gemini Robotics and Gemini Robotics-ER, Google DeepMind is pushing the boundaries of AI in robotics. These models promise to revolutionize industries by enabling robots to perform complex tasks efficiently and safely. While commercial availability is yet to be announced, the technology is poised to significantly contribute to the development of intelligent and adaptable robots.
Read more: Garrett Camp: The Visionary Entrepreneur Who Transformed Technology & Transportation