1Inria, École normale supérieure, CNRS, PSL Research University 2IIIT Hyderabad
Hiveformer can adapt to simultaneously perform 74 tasks from RLBench given language instructions. Note that tasks can have multiple variations, such as the push buttons task. We test our model on unseen variations on such tasks.
Hiveformer jointly models instructions, views from multiple cameras, and historical actions and observations with a multimodal transformer for robotic manipulation.