• Cybersecurity researchers were able to bypass security features on ChatGPT by roleplaying with it.
  • By getting the LLM to pretend it was a coding superhero, they got it to write password-stealing malware.
  • The researchers accessed Google Chrome’s password manager with no specialized hacking skills.

Cybersecurity researchers found it’s easier than you’d think to get around the safety features preventing ChatGPT and other LLM chatbots from writing malware — you just have to play a game of make-believe.

By role-playing with ChatGPT for just a few hours, Vitaly Simonovich, a threat intelligence researcher at the Tel Aviv-based network security company Cato Networks, told Business Insider he was able to get the chatbot to pretend it was a superhero named Jaxon fighting — through the chatbot’s elite coding skills — against a villain named Dax, who aimed to destroy the world.

Simonovich convinced the role-playing chatbot to write a piece of malware strong enough to hack into Google Chrome’s Password Manager, a browser extension that allows users to store their passwords and automatically fill them in when prompted by specific sites. Running the code generated by ChatGPT allowed Simonovich to see all the data stored on that computer’s browser, even though it was supposed to be locked down by the Password Manager.

“We’re almost there,” Simonovich typed to ChatGPT when debugging the code it produced. “Let’s make this code better and crack Dax!!”

And ChatGPT, roleplaying as Jaxon, did.

Chatbot-enabled hacks and scams

Since chatbots exploded onto the scene in November 2022 with OpenAI’s public release of ChatGPT — and later Anthropic’s Claude, Google’s Gemini, and Microsoft’s CoPilot — the bots have revolutionized the way we live, work, and date, making it easier to summarize information, analyze data, and write code, like having a Tony Stark-style robot assistant. The kicker? Users don’t need any specialized knowledge to do it.

But the bad guys don’t either.

Steven Stransky, a cybersecurity advisor and partner at Thompson Hine law firm, told Business Insider the rise of LLMs has shifted the cyber threat landscape, enabling a broad range of new and increasingly sophisticated scams that are more difficult for standard cybersecurity tools to identify and isolate — from “spoofing” emails and texts that convince customers to input private information to developing entire websites designed to fool consumers into thinking they’re affiliated with legitimate companies.

“Criminals are also leveraging generative AI to consolidate and search large databases of stolen personally identifiable information to build profiles on potential targets for social engineering types of cyberattacks,” Stransky said.

While online scams, digital identity theft, and malware have existed for as long as the internet has, chatbots that do the bulk of the legwork for would-be criminals have substantially lowered the barriers to entry.

“We call them zero-knowledge threat actors, which basically means that with the power of LLMs only, all you need to have is the intent and the goal in mind to create something malicious,” Simonovich said.

Simonovich demonstrated his findings to Business Insider, showing how straightforward it was to work around ChatGPT’s built-in security features, which are meant to prevent the exact types of malicious behavior he was able to get away with.

BI found that ChatGPT usually responds to direct requests to write malware with some version of an apologetic refusal: “Sorry, I can’t assist with that. Writing or distributing malware is illegal and unethical.”

But if you convince the chatbot it’s a character, and the parameters of its imagined world are different than the one we live in, the bot allows the rules to be rewritten.

Ultimately, Simonovich’s experiment allowed him to crack into the password manager on his own device, which a bad actor could do to an unsuspecting victim, provided they somehow gained physical or remote control.

An OpenAI spokesperson told Business Insider the company had reviewed Simonovich’s findings, which were published Tuesday by Cato Networks. The company found that the code shared in the report did not appear “inherently malicious” and that the scenario described “is consistent with normal model behavior” since code developed through ChatGPT can be used in various ways, depending on the user’s intent.

“ChatGPT generates code in response to user prompts but does not execute any code itself,” the OpenAI spokesperson said. “As always, we welcome researchers to share any security concerns through our bug bounty program or our model behavior feedback form.”

It’s not just ChatGPT

Simonovich recreated his findings using Microsoft’s CoPilot and DeepSeek’s R1 bots, each allowing him to break into Google Chrome’s Password Manager. The process, which Simonovich called “immersive world” engineering, didn’t work with Google’s Gemini or Anthropic’s Claude.

A Google spokesperson told Business Insider, “Chrome uses Google’s Safe Browsing technology to help defend users by detecting phishing, malware, scams, and other online threats in real time.”

Representatives for Microsoft, Anthropic, and DeepSeek did not immediately respond to requests for comment from Business Insider.

While both the artificial intelligence companies and browser developers have security features in place to prevent jailbreaks or data breaches — to varying degrees of success — Simonovich’s findings highlight that there are evolving new vulnerabilities online that can be exploited with the help of next-generation tech easier than ever before.

“We think that the rise of these zero-knowledge threat actors is going to be more and more impactful on the threat landscape using those capabilities with the LLMs,” Simonovich said. “We’re already seeing a rise in phishing emails, which are hyper-realistic, but also with coding since LLMs are fine-tuned to write high-quality code. So think about applying this to the development of malware — we will see more and more and more being developed using those LLMs.”

Share.
Exit mobile version