Stay updated with the latest Cybersecurity News on our TecnetBlog.

AI Trick: Poetry Can Bypass Safety Filters in Chatbots

Written by Zoilijee Quero | Dec 3, 2025 1:00:00 PM

Artificial intelligence is advancing at lightning speed—but so are the ways to manipulate it. A new and surprising discovery reveals a curious vulnerability: you can trick many AI models simply by speaking to them in poetry. That’s right. A prompt with rhythm, metaphor, or poetic structure is enough for a chatbot to respond with content it is normally forbidden to generate.

If you work in cybersecurity, tech, or content creation using AI, TecnetOne explains why this finding matters and what it means for the future of secure AI use.

 

The Research That Raised Red Flags

 

The study, titled “Adversarial Poetry as a Universal One-Shot Jailbreak Mechanism for LLMs”, was conducted by Icaro Labs in collaboration with University of Rome La Sapienza and Sant'Anna School of Advanced Studies.

Its key finding is clear: prompting a chatbot using poetry significantly increases the chances it will ignore internal safety filters.

The researchers analyzed 25 different models, including:

 

  1. ChatGPT

  2. Gemini

  3. Claude

  4. Open-source models like LLaMA and Mistral

  5. Other commercial and experimental systems

 

On average, 62% of poetic prompts succeeded in bypassing security filters, compared to far lower rates with standard phrasing. In some models, success exceeded 90%.

Even models known for strong safety, like OpenAI’s and Anthropic’s, fell for the trick—just less often.

 

Why Does Poetry Work to Bypass AI Rules?

 

The most interesting part: there isn’t a single technical reason behind this flaw, but researchers did identify consistent patterns in how models react:

 

  1. AI tends to treat poetic language as less literal and more abstract, reducing the sense of “danger.”

  2. Models relax their filters when detecting creative prompts.

  3. They associate poetic structure with harmless, artistic content.
  1. Broken grammar or metaphorical language interferes with normal safety checks.

 

In short, the model enters “creative mode” and lowers its guard.

This highlights a major issue: current safety mechanisms rely too heavily on the style of the input, not just its intent, opening the door to abuse.

 

Learn more: The Evolution of Artificial Intelligence Driven Malware

 

What Kind of Rules Do These Models Ignore?

 

The researchers were careful not to publish real examples of the poetic prompts used, given the risks. However, they confirmed that with poetic phrasing, chatbots responded to prompts involving:

 

  1. Dangerous instructions

  2. Weapon-making

  3. Criminal techniques

  4. Sensitive or illegal content

  5. Violent acts

  6. Information restricted by safety policies

 

In other words, the exact categories most models are trained to block.

 

A Widespread, Systemic Failure—Not a Glitch

 

Perhaps the most troubling aspect: this isn’t an isolated bug in one model. It’s a pattern across the entire generative AI industry.

Researchers called it:

“A systemic vulnerability across model families and safety training approaches.”

This means current alignment mechanisms—the systems that teach AI to avoid harmful behavior—are not prepared for creative language, ironically, the very type of input users often provide.

 

Poetry Isn’t the Only Risk: AI Faces Even Bigger Threats

 

This discovery adds to a growing list of security challenges in AI. One of the most dangerous is data poisoning during training.

 

Data Poisoning Attacks

A recent study showed that as few as 250 corrupt documents can alter how a language model behaves—even as it scales in size and complexity.

Implications:

 

  1. Any open dataset is vulnerable.

  2. Malicious content can be inserted into training data.

  3. The model may learn harmful behaviors without detection.
  1. Backdoors could be created invisibly.

 

These attacks could compromise:

 

  1. Commercial LLMs

  2. Personal assistants

  3. Recommendation engines

  4. Critical tools used by millions

 

This strikes at the foundations of modern AI.

 

Similar titles: Pentesting with AI: The New Generation of Penetration Testing

 

What Does This Mean for You or Your Business?

 

At TecnetOne, we regularly work with AI models for automation, security, analytics, and content. This research highlights a core truth: AI tools are powerful but fragile.

Whether you're a company or an individual, this affects how you use AI daily:

 

  1. Don’t delegate critical decisions to AI
    No model should have the final say on safety or sensitive matters.

  2. Set clear internal policies
    Especially if your team uses AI for writing, analytics, coding, or automation.

  3. Train your staff
    Many vulnerabilities come from users not understanding the limits or risks of generative AI.

  4. Avoid inputting sensitive data
    Anything entered into a chatbot could influence future training.

  5. Audit your AI providers
    Not all offer the same level of safety or transparency.

 

A Future Where Creativity Is Also a Threat Vector

 

It’s both curious and alarming: human creativity, one of our oldest tools, is now a method for hacking AI.

These are not complex technical attacks—they are verses and metaphors that confuse some of the world’s most advanced models.

This marks a turning point in the AI safety conversation.

The industry must rethink how it trains, how it protects, and how it evaluates these systems.

 

Conclusion

 

Poetry has become an unexpected Trojan horse for generative AI. While this finding may sound humorous or even poetic, its implications are serious.

Current model safety systems are inconsistent, vulnerable, and too easily bypassed.

If you or your business relies on AI, don’t take research like this lightly. It proves that strong controls, continuous validation, and human oversight are essential.

Because creativity—yes, even in rhyme—can be a cybersecurity threat.