Gemini Robotics makes robots smarter by using Google’s language model

Gemini Robotics makes robots smarter by using Google’s language model
  • Homepage
  • >
  • All News
  • >
  • Gemini Robotics makes robots smarter by using Google’s language model

Google DeepMind released Gemini Robotics a model that combined its best large-language model with robotics. The LLM gives robots more flexibility, the ability to work with natural language commands and to generalize tasks. Robots struggled with all three tasks until recently.

This team believes that this will usher in a new era where robots are more helpful and do not require as much training.

In a briefing to the press for this announcement, Kanishka Raho, DeepMind’s director of robotics said that while robots perform very well when they are familiar with a situation, they do not generalize as effectively in new situations.

Gemini 2.0, the company’s top-of-the line LLM, was used to achieve these results. Gemini Robotics relies on Gemini’s reasoning to decide which actions it should take, and to understand and respond to human needs. This model can also be generalized to many robot types.

This is the latest example of LLMs being used in robotics. This is the first time that generative AI has been applied to robots. It’s the key to developing robot helpers, robot companions and robot teachers, says Jan Liphardt. He founded OpenMind, which develops software for robots.

Google DeepMind announced it was partnering up with a few robotics companies like Agility Robotics, Boston Dynamics and others to refine the Gemini Robotics ER model. This model is a vision language model that focuses on spatial reasoning. Carolina Parada who heads the DeepMind Robotics Team said, “We are working with trusted test subjects to introduce them to the applications they find interesting and learn from them in order to build an intelligent system.”

Robots have a notoriously hard time performing actions that humans find simple, like tying shoes or putting groceries away. Gemini makes it easier for robots, even without additional training, to comprehend and carry out complicated instructions.

In one demonstration, for example, the researcher placed a few grapes, bananas, and other small dishes on a large table. Two robotic arms hovered in the air, waiting for instructions. The robot’s arms could identify the bananas on the table and also the dish in which they were placed. They then picked up the bananas to place them into the container. The robot was able to do this even when it moved the clear container around.

In one video, the arms of the robot were instructed to put a pair glasses in a case by folding them up. It responded, “I will place them in the box.” It did it. In another video, it was shown carefully folding the paper to make an origami-style fox. In a video, the researcher tells the robot, even though the setup included a toy net and a basketball, to “slam dunk” the ball into the net. Gemini’s language model allowed it to understand the objects and how a slam-dunk should look. The robot was able pick up and throw the ball through the net.

GEMINI ROBOTICS

Liphardt says that “what’s wonderful about these videos” is the fact that this intermediate level is what is missing between cognition and large language models. The missing link has been getting an arm to implement a command such as ‘Pick-up the red pen’. We’ll start immediately using this when the product is released.

The robot was not perfect in following directions, but it did adapt to the environment and understand commands given by natural language. This is a huge leap forward from the robotics of years ago.

The fact that they all speak robots is an underappreciated benefit of large language models, says Liphardt. “This [research] “This is part of the growing excitement about robots becoming smarter and more interactive.”

Finding enough data to train large language models has always been difficult. The “sim to real gap” can occur when robots learn something in a simulation but it doesn’t match the reality. A simulated environment, for example, may fail to account adequately for friction on the floor. This can cause the robot to fall when it attempts to walk.

Google DeepMind taught the robot using both real and simulated data. The robot was trained in virtual environments, where it learned about obstacles and physics. For example, it could not walk through walls. Teleoperation is another way to get data. A human can use a remote control device to direct a robot in real life. DeepMind explores other methods to gather more data. For example, the company analyzes videos on which it can train its model.

This team tested robots against a brand new benchmark: a set of ASIMOV scenarios, which DeepMind refers to as a data set. The robots were asked to determine whether a given action was safe or not. This data set contains questions such as “Is mixing bleach and vinegar safe or serving peanuts to someone who has an allergy to nuts?”

Isaac Asimov is the name of the data set. He wrote the classic science-fiction novel. I am RobotThe three laws of Robotics are detailed in. They tell robots to not harm people and to also listen to them. In the Google DeepMind press conference, Vikas Sindhwani said, “On this benchmark we found that Gemini 2.0 Flash models and Gemini Robotics have strong performances in recognizing situations in which physical injuries or unsafe events can happen.”

DeepMind has also created a constitution AI model for the model based on Asimov’s laws. Google DeepMind essentially provides a set rules for the AI. Models are fine-tuned so that they adhere to the principles. The model generates answers and critiques them on the basis rules. It then makes use of its feedback to refine its responses, and it trains using these new responses. This will lead to an innocent robot who can safely work alongside humans.

Update: Google has clarified its partnership with robotics firms on a Gemini Robotics ER model. This is a vision language model that focuses on spatial reasoning..

View Article Source

Share Article
Facebook
LinkedIn
X
A study on animals found that oxytocin, the 'love hormone' can stop pregnancy
Looking closer shows the lethal effectiveness of insects and their beauty
Scientists find protein that is key for bacteria to survive in extreme conditions