Industry Landscape

Vision-Language-Action Models

An overview of Vision-Language-Action (VLA) models that enable robots to understand language instructions and perform manipulation tasks.

Ecosystem Snapshot

Models

Projects

Leading Models

Helix

Helix is Figure AI's proprietary VLA model for generalist humanoid control with zero-shot manipulation and multi-robot collaboration.

π0 (pi-zero)

pi0 (pi-zero) is Physical Intelligence's generalist VLA robot foundation model for zero-shot dexterous manipulation across 8 robot types.

OpenVLA

OpenVLA is a pioneering open-source 7B VLA model combining a pretrained VLM with action de-tokenization for zero-shot robot manipulation.

RT-2

RT-2 is a Google DeepMind VLA model transferring web-scale knowledge to robotic control via VLM fine-tuning.

RT-1

RT-1 (Robotics Transformer) is Google DeepMind's pioneering scalable Transformer model for real-world robot control, demonstrating that task-agnostic training and high-capacity architectures enable generalizable robotic policies.

RT-2-X

A vision-language-action (VLA) model trained on the Open X-Embodiment dataset. Part of the RT-X model family, building on RT-2 with cross-embodiment capabilities.

Leading Projects

NVIDIA Isaac GR00T

NVIDIA Isaac GR00T is an open reference platform and development project for general-purpose humanoid robotics, combining open foundation models (GR00T N1 series), data pipelines, simulation frameworks, CUDA-X accelerated runtime libraries, and NVIDIA Jetson Thor for real-time robot inference and control.

Industry Insights

This page aggregates Vision-Language-Action (VLA) models that combine internet-scale vision-language pretraining with robot control outputs. VLA models represent a paradigm shift in robotics, enabling zero-shot generalization, cross-embodiment transfer, and natural language-driven task execution.

The collection includes both proprietary industry models (Helix, RT-2) and open-source alternatives (OpenVLA, π0), covering a range of architectures, training datasets, and deployment scenarios.