Back to Search
C

CALVIN

Datasetactive

CALVIN (Composing Actions from Language and Vision) is a benchmark for long-horizon, language-conditioned robot manipulation, developed by University of Freiburg. Published in IEEE RA-L 2022 (Best Paper Award). Comprises 34 unique manipulation tasks across 4 distinct kitchen environments (A, B, C, D) with ~24 hours of teleoperation data. Includes RGB, depth, tactile, proprioception, and language annotations with pre-computed embeddings from 10+ language models. Evaluation includes LH-MTLC (up to 5 sequential instructions) and zero-shot generalization to new environments (ABC->D, ABCD->D). Licensed under MIT. Standard benchmark for VLA model evaluation.

Details

Updated:6/20/2026
sample count24
modalityvision, depth, proprioception, language, tactile
licenseMIT

Tags

long-horizonlanguage-conditionedrobot-manipulationbenchmarkmulti-tasksimulationzero-shot-generalization

Relationships

Sources

No sources available.

Related Knowledge Pages

No related knowledge pages.
CALVIN | Dataset | EmbodiedHub