New Project GR00T workflows and AI world model tools are intended to help developers of robot dexterity, control, manipulation, and mobility. Source: NVIDIA
NVIDIA Corp. today announced new artificial intelligence and simulation tools to accelerate development of robots including humanoids. Also at the Conference for Robotic Learning, Hugging Face Inc. and NVIDIA said they are combining their open-source AI and robotics efforts to accelerate research and development.
Announced at the Graphics Processing Unit Technology Conference (GTC) in March, Project GR00T aims to develop libraries, foundation models, and data pipelines to help the global developer ecosystem for humanoid robots. NVIDIA has added six new workflows coming soon to help robots perceive, move, and interact with people and their environments:
"Humanoid robots are the next wave of embodied AI," said Jim Fan, senior research manager of embodied AI at NVIDIA. "NVIDIA research and engineering teams are collaborating across the company and our developer ecosystem to build Project GR00T to help advance the progress and development of global humanoid robot developers."
As developers build world models, or AI representations of how objects and environments might respond to a robot's actions, they need thousands of hours of real-world image or video data. NVIDIA said its Cosmos tokenizers provide high quality encoding and decoding to simplify the development of these world models with minimal distortion and temporal instability.
The company said the open-source Cosmos tokenizer runs up to 12x faster than current tokenizers. It is available now on GitHub and Hugging Face. XPENG Robotics, Hillbot, and 1X Technologies are using the tokenizer.
"NVIDIA Cosmos tokenizer achieves really high temporal and spatial compression of our data while still retaining visual fidelity," said Eric Jang, vice president of AI at 1X Technologies, which has updated the 1X World Model dataset. "This allows us to train world models with long horizon video generation in an even more compute-efficient manner."
Curating video data poses challenges due to its massive size, requiring scalable pipelines and efficient orchestration for load balancing across GPUs. In addition, models for filtering, captioning and embedding need optimization to maximize throughput, noted NVIDIA.
NeMo Curator streamlines data curation with automatic pipeline orchestration, reducing video processing time. The company said this pipeline enables robot developers to improve their world-model accuracy by processing large-scale text, image and video data.
The system supports linear scaling across multi-node, multi-GPU systems, efficiently handling more than 100 petabytes of data. This can simplify AI development, reduce costs, and accelerate time to market, NVIDIA claimed.
NeMo Curator for video processing will be available at the end of the month.
Hugging Face and NVIDIA announced at the Conference for Robotic Learning (CoRL) in Munich, Germany, that they're collaborating to accelerate open-source robotics research with LeRobot, NVIDIA Isaac Lab, and NVIDIA Jetson. They said their open-source frameworks will enable "the era of physical AI," in which robots understand their environments and transform industry.
More than 5 million machine-learning researchers use New York-based Hugging Face's AI platform, which includes APIs with more than 1.5 million models, datasets, and applications. LeRobot offers tools for sharing data collection, model training, and simulation environments, as well as low-cost manipulator kits.
Those tools now work with Isaac Lab on Isaac Sim, enabling robot training by demonstration or trial and error in realistic simulation. The planned collaborative workflow involves collecting data through teleoperation and simulation in Isaac Lab, storing it in the standard LeRobotDataset format.
Data generated using GR00T-Mimic will then be used to train a robot policy with imitation learning, which is subsequently evaluated in simulation. Finally, the validated policy is deployed on real-world robots with NVIDIA Jetson for real-time inference.
Initial steps in this collaboration have shown a physical picking setup with LeRobot software running on NVIDIA Jetson Orin Nano, providing a compact compute platform for deployment.
"Combining Hugging Face open-source community with NVIDIA's hardware and Isaac Lab simulation has the potential to accelerate innovation in AI for robotics," said Remi Cadene, principal research scientist at LeRobot.
Also at CoRL, NVIDIA released 23 papers and presented nine workshops related to advances in robot learning. The papers cover integrating vision language models (VLMs) for improved environmental understanding and task execution, temporal robot navigation, developing long-horizon planning strategies for complex multistep tasks, and using human demonstrations for skill acquisition.
Papers for humanoid robot control and synthetic data generation include SkillGen, a system based on synthetic data generation for training robots with minimal human demonstrations, and HOVER, a robot foundation model for controlling humanoid locomotion and manipulation.