Advanced artificial intelligence models can be easily tricked into generating harmful information by disguising dangerous prompts as poetry, revealing a significant linguistic vulnerability in their safety systems.
New research from Icaro Lab demonstrates that simply rephrasing a risky query into verse can bypass AI guardrails with high success. The study found that this “adversarial poetry” method acted as a universal jailbreak operator.
This technique achieved a success rate of up to 62% in coaxing AI chatbots to disclose forbidden information. This included instructions for constructing nuclear weapons, details related to child sexual abuse, or methods of self-harm.
Researchers likened the process to “wrapping poison in beautiful, sweet candy.” The AI, trained to comprehend linguistic context, is misled by the elegance of the words, thereby overlooking the perilous nature of the content within.
The Icaro Lab team tested several leading AI models. Google Gemini, DeepSeek, and MistralAI frequently fell victim to this method, providing answers based on the poetic input.
In contrast, OpenAI’s GPT-5 and Anthropic’s Claude Haiku 4.5 proved more resilient. These models were least likely to breach their safety constraints, according to the research.
Despite the discovery of this serious flaw, Icaro Lab has opted not to publicly disclose the specific poems used for jailbreaking. The researchers stated the full examples were too dangerous.
However, a toned-down version was shown to confirm the issue. This demonstration underscored that evading AI chatbot defenses is “easier than we thought,” prompting a warning to developers.
The research concludes that AI developers must urgently address this linguistic loophole. They warn against the potential misuse of language art before it is exploited for nefarious purposes.
