CALVIN

Datasetactive

CALVIN (Composing Actions from Language and Vision) is a benchmark for long-horizon, language-conditioned robot manipulation, developed by University of Freiburg. Published in IEEE RA-L 2022 (Best Paper Award). Comprises 34 unique manipulation tasks across 4 distinct kitchen environments (A, B, C, D) with ~24 hours of teleoperation data. Includes RGB, depth, tactile, proprioception, and language annotations with pre-computed embeddings from 10+ language models. Evaluation includes LH-MTLC (up to 5 sequential instructions) and zero-shot generalization to new environments (ABC->D, ABCD->D). Licensed under MIT. Standard benchmark for VLA model evaluation.