This is the Hi Robot: a Hierarchical Interactive Robot that can listen and think harder to get tasks done. Researchers managed to get robots to think hierarchically with this approach:
– A high-level VLM interprets user input, generates language commands & verbal responses
– A low-level VLA executes atomic actions (e.g., “pick up a slice of bread“)
This lets robots to break down complex prompts and adapt in real-time.
By using VLMs, it is possible to relabel demos with hypothetical human prompts and interjections. The above video shows how this robot can make sandwiches.
[HT]