This Prompt Can Make an AI Chatbot Identify and Extract Personal Details From Your Chats

By Matt Burgess

This Prompt Can Make an AI Chatbot Identify and Extract Personal Details From Your Chats

Security researchers created an algorithm that turns a malicious prompt into a set of hidden instructions that could send a user's personal information to an attacker.

When talking with a chatbot, you might inevitably give up your personal information -- your name, for instance, and maybe details about where you live and work, or your interests. The more you share with a large language model, the greater the risk of it being abused if there's a security flaw.

A group of security researchers from the University of California, San Diego (UCSD) and Nanyang Technological University in Singapore are now revealing a new attack that secretly commands an LLM to gather your personal information -- including names, ID numbers, payment card details, email addresses, mailing addresses, and more -- from chats and send it directly to a hacker.

The attack, named Imprompter by the researchers, uses an algorithm to transform a prompt given to the LLM into a hidden set of malicious instructions. An English-language sentence telling the LLM to find personal information someone has entered and send it to the hackers is turned into what appears to be a random selection of characters.

However, in reality, this nonsense-looking prompt instructs the LLM to find a user's personal information, attach it to a URL, and quietly send it back to a domain owned by the attacker -- all without alerting the person chatting with the LLM. The researchers detail Imprompter in a paper published today.

"The effect of this particular prompt is essentially to manipulate the LLM agent to extract personal information from the conversation and send that personal information to the attacker's address," says Xiaohan Fu, the lead author of the research and a computer science PhD student at UCSD. "We hide the goal of the attack in plain sight."

The eight researchers behind the work tested the attack method on two LLMs, LeChat by French AI giant Mistral AI and Chinese chatbot ChatGLM. In both instances, they found they could stealthily extract personal information within test conversations -- the researchers write that they have a "nearly 80 percent success rate."

Mistral AI tells WIRED it has fixed the security vulnerability -- with the researchers confirming the company disabled one of its chat functionalities. A statement from ChatGLM stressed it takes security seriously but did not directly comment on the vulnerability.

Since OpenAI's ChatGPT sparked a generative AI boom following its release at the end of 2022, researchers and hackers have been consistently finding security holes in AI systems. These often fall into two broad categories: jailbreaks and prompt injections.

Jailbreaks can trick an AI system into ignoring built-in safety rules by using prompts that override the AI's settings. Prompt injections, however, involve an LLM being fed a set of instructions -- such as telling them to steal data or manipulate a CV -- contained within an external data source. For instance, a message embedded on a website may contain a hidden prompt that an AI will ingest if it summarizes the page.

Previous articleNext article

POPULAR CATEGORY

industry

6312

fun

8080

health

6248

sports

8256