The vision of personal robots, once confined to the realms of science fiction, is materializing into reality. Before now, our fascination with robots was long-fueled by cinematic portrayals of intelligent and versatile machines: going on a wild adventure with Baymax in “Big Hero 6” or enjoying a peaceful afternoon teatime with adorable WALL-E in the eponymous film. Yet, until recently, the robots we encountered were largely single-purpose entities, lacking the sophistication or autonomy to fulfill our imaginations.
However, the landscape of robotics is undergoing a profound transformation. Engineers are ushering in a new era characterized by the emergence of general-purpose robots – machines endowed with the capability to perform numerous tasks and comprehend human language. These robots represent a paradigm shift in robotics, moving beyond the constraints of predefined functions to embrace adaptability and versatility.
In the current stage of development, many companies are making general-purpose robots in humanoid form. In December 2023, Tesla announced their new generation of a humanoid robot, “Optimus Gen 2,” and anticipate the beginning of sales in 2025. Also, on May 1, Microsoft revealed that they partnered with Sanctuary AI, an AI startup company, to help develop general-purpose humanoid robots. Microsoft will provide resources for Sanctuary AI to enhance the performance of AI on the robot. Meanwhile, Microsoft has invested in another robotics business, Figure Robotics. The company is developing a humanoid called Figure01, equipped with OpenAI’s ChatGPT.
Among many big tech companies and startups, Toyota Research Institute (TRI) unveiled seemingly the most impressive progress in teaching robots how to behave. On Sept. 29, 2023, TRI announced new generative AI technology to teach robots specific actions, moving closer to building a Large Behavior Model (LBM). Although LBM is an unfamiliar concept, it can be considered analogous to the Large Language Model (LLM) in generative language AI.
Before recent advancements, it was extremely costly to teach robots new behaviors. Engineers had to program detailed actions of robots for hours, and run a myriad of trials to calibrate the movements. Hence, it was more like manipulating robots to perform only the set of course of actions instead of actually teaching the robots.
The technique that TRI developed, however, is much closer to “teaching” robots. The first step to learning a behavior is getting input data from the demonstrations of the instructor. Then, the robot behavior model aligns the data and description of the task. Lastly, it uses a Diffusion Policy, the AI technique that TRI created, to learn the given behavior. Using this method, researchers have already taught the behavior model 60 different dexterous skills, like picking up objects or pouring liquids, without additional coding.
The group of engineers’ next goal is to teach the robot more than 1000 sophisticated skills by the end of the year. It would have been impossible with conventional techniques, but the power of the new technology allows flexibility, reliability, and speed when teaching robots. Learning from camera images and haptic sensors not only helps to bypass the programming step but also assists the robot in interacting and adjusting to the environment, contributing to the reliability and speed of the teaching process.
Besides TRI, other companies like Sanctuary AI are investing billions of dollars and hundreds of experts to build LBM. Although it may take time, researchers know that LBM will change the world just like LLM shocked the world with advanced chat AI. One day soon, we might really have a personal secretary robot, house helper robot or healthcare robot, to reduce our workload and help us enjoy our lives.