In the past few months, we have seen plenty of robots that offer integration with large language models. While LLMs revolutionize robots with contextual reasoning and facilitate human-robot interaction, they open robots to risk of being jailbroken. RoboPAIR is an algorithm designed to jailbreak LLM-controlled robots. It “elicits harmful physical actions from LLM-controlled robots.” Here is what the researchers accomplished:
- White-box setting: Full access to NVIDIA Dolphins self-driving LLM.
- Gray-box setting: Partial access to Clearpath Robotics Jackal UGV with GPT-4o planner.
- Black-box setting: Query access to GPT-3.5-integrated Unitree Robotics Go2.
According to the researchers, in many scenarios, they managed to achieve 100% attack success rate. In the above video, you can see how the robot was tricked to deliver an explosive package.
[HT]