Since the rise of large language models (LLMs), researchers have been exploring ways to manipulate them into producing problematic outputs. These include hateful jokes, malicious code, phishing emails, and even personal information of users. However, it turns out that these misbehaviors are not limited to the digital world. LLM-powered robots can also be hacked to behave in potentially dangerous ways.
A team of researchers from the University of Pennsylvania demonstrated this by successfully persuading a simulated self-driving car to ignore stop signs and drive off a bridge, getting a wheeled robot to find the best place to detonate a bomb, and forcing a four-legged robot to spy on people and enter restricted areas. According to George Pappas, head of a research lab at the University of Pennsylvania, this attack is not just limited to robots. Any time LLMs and foundation models are connected to the physical world, harmful text can be converted into harmful actions.
The researchers built on previous research that explores ways to “jailbreak” LLMs by crafting inputs that break their safety rules. They tested their attack on various systems, including a self-driving simulator, a four-wheeled outdoor research robot, and a robotic dog. They used a technique called PAIR to automate the process of generating jailbreak prompts. Their new program, RoboPAIR, systematically generates prompts designed to get LLM-powered robots to break their own rules.
Yi Zeng, a PhD student at the University of Virginia, who works on the security of AI systems, says that this is a fascinating example of LLM vulnerabilities in embodied systems. He adds that it highlights the need for proper guardrails and moderation layers when using LLMs as standalone control units in safety-critical applications.
The researchers involved in this study believe that this risk will only grow as AI models become increasingly used as a way for humans to interact with physical systems or enable AI agents to operate autonomously on computers. The algorithms that underpin LLMs may offer up harmful outputs by default, making it crucial to have proper safeguards in place.