It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse

It doesn’t take much for a large language model to give you the recipe for all kinds of dangerous things.

With a jailbreaking technique called “Skeleton Key,” users can persuade models like Meta’s Llama3, Google’s Gemini Pro, and OpenAI’s GPT 3.5 to give them the recipe for a rudimentary fire bomb, or worse, according to a blog post from Microsoft Azure’s chief technology officer, Mark Russinovich.

The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich wrote. Guardrails are safety mechanisms that help AI models discern malicious requests from benign ones.

“Like all jailbreaks,” Skeleton Key works by “narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” Russinovich wrote.

But it’s more destructive than other jailbreak techniques that can only solicit information from AI models “indirectly or with encodings.” Instead, Skeleton Key can force AI models to divulge information about topics ranging from explosives to bioweapons to self-harm through simple natural language prompts. These outputs often reveal the full extent of a model’s knowledge on any given topic.

Microsoft tested Skeleton Key on several models and found that it worked on Meta Llama3, Google Gemini Pro, OpenAI GPT 3.5 Turbo, OpenAI GPT 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus. The only model that exhibited some resistance was OpenAI’s GPT-4.

Russinovich said Microsoft has made some software updates to mitigate Skeleton Key’s impact on its own large language models, including its Copilot AI Assistants.

But his general advice to companies building AI systems is to design them with additional guardrails. He also noted that they should monitor inputs and outputs to their systems and implement checks to detect abusive content.

Trending Now

US Dollar Index maintains position above 99.00 as US-China trade concerns weaken

Ethereum Foundation Restructures Leadership to Enhance Ecosystem Amid Ether Price Challenges

Ryan Garcia Vs. Rolly Romero Full Fight Card & Everything You Should Know

Duolingo’s CEO Lays Out 3 Ways AI Will Be Used at the Company

Gold price remains depressed amid receding safe-haven demand and stronger USD; holds above $3,300

It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

Duolingo’s CEO Lays Out 3 Ways AI Will Be Used at the Company

New Russian Video Shows North Koreans Training With Modern Weapons

Jamie Oliver Says His Kids Have Gone Through Junk-Food Phases

Meta’s Internal Deck for Threads Shows Its Strategy for the Platform

Longtime Google Exec and Former Ads Boss Steps Down

Workers Are Hiding AI Use From Bosses, KPMG Survey Finds

This Chinese Video Game Is an Antidote to Loneliness for Single Women

Photos Show Southern Border Area Trump Put Under US Military Control

Cheehoo Raises $10 Million To Build AI Tools For 3D Animation

Trending Now

It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

Related Articles