Back to Search
C
CALVIN
DatasetactiveCALVIN (Composing Actions from Language and Vision) is a benchmark for long-horizon, language-conditioned robot manipulation, developed by University of Freiburg. Published in IEEE RA-L 2022 (Best Paper Award). Comprises 34 unique manipulation tasks across 4 distinct kitchen environments (A, B, C, D) with ~24 hours of teleoperation data. Includes RGB, depth, tactile, proprioception, and language annotations with pre-computed embeddings from 10+ language models. Evaluation includes LH-MTLC (up to 5 sequential instructions) and zero-shot generalization to new environments (ABC->D, ABCD->D). Licensed under MIT. Standard benchmark for VLA model evaluation.
Details
Updated:6/20/2026
sample count24
modalityvision, depth, proprioception, language, tactile
licenseMIT
Tags
long-horizonlanguage-conditionedrobot-manipulationbenchmarkmulti-tasksimulationzero-shot-generalization
Relationships
Sources
No sources available.
Related Knowledge Pages
No related knowledge pages.