Back to Search
O

OpenVLA

Modelactive

OpenVLA is an open-source vision-language-action (VLA) model for robot manipulation, developed as a collaboration between Stanford University, UC Berkeley, and others. The model combines a pretrained vision-language model (Prismatic VLM, based on SigLIP and Llama 2) with an action de-tokenization head to produce robot control commands. With 7 billion parameters, OpenVLA is trained on the Open X-Embodiment dataset — a large-scale collection of robot demonstration data spanning multiple robots and tasks. The model can follow language instructions to perform a wide variety of manipulation tasks zero-shot, and can be fine-tuned for specific downstream applications. OpenVLA demonstrated that internet-scale vision-language pretraining could be effectively transferred to robot control, making it a pioneering open-source VLA model. It supports multiple robot platforms and provides a foundation for further research in generalist robot policies. OpenVLA's release in 2024 (arXiv: 2406.09246) was a significant milestone in open robotics AI, enabling researchers worldwide to work with generalist robot models.

Details

Updated:6/6/2026
open sourcetrue
release date2024-06-13
github urlhttps://github.com/openvla/openvla
paper urlhttps://arxiv.org/abs/2406.09246
model familyVLA (Vision-Language-Action)
huggingface urlhttps://huggingface.co/openvla

Tags

VLAmanipulationfoundation-modelopen-sourcegeneralist

Relationships

Sources

https://arxiv.org/abs/2406.09246
website
Visit
https://github.com/openvla/openvla
website
Visit

Related Knowledge Pages

No related knowledge pages.
OpenVLA | Model | EmbodiedHub