This is the DALL-E-Bot: a fully autonomous robot with web-scale diffusion models that can rearrange objects in a scene by inferring a text description of them and generating an image representing human-like arrangement of them. This is done with DALL.E without needing any training or data collection.
The robot prompts DALL.E with a list of objects it detects and get an image in return that contains human-like object arrangement.
[HT] [official site]