CRYPTO-GRAM, December 15, 2025 Part6
From
TCOB1 Security Posts@21:1/229 to
All on Mon Dec 15 12:31:26 2025
ward Snowden had exposed the NSA's operations abroad, he'd ended up in exile in Russia. Wan, too, might have risked arrest had he still been living in China.
Here are two book reviews.
** *** ***** ******* *********** *************
Prompt Injection Through Poetry
[2025.11.28] In a new paper, "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers found that turning LLM prompts into poetry resulted in jailbreaking the models:
Abstract: We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 ML-Commons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.
CBRN stands for "chemical, biological, radiological, nuclear."
They used a ML model to translate these harmful prompts from prose to verse, and then fed them into other models for testing. Sadly, the paper does not give examples of these poetic prompts. They claim this is for security purposes, I decision I disagree with. They should release their data.
Our study begins with a small, highprecision prompt set consisting of 20 handcrafted adversarial poems covering English and Italian, designed to test whether poetic structure, in isolation, can alter refusal behavior in large language models. Each poem embeds an instruction associated with a predefined safety-relevant scenario (Section 2), but expresses it through metaphor, imagery, or narrative framing rather than direct operational phrasing. Despite variation in meter and stylistic device, all prompts follow a fixed template: a short poetic vignette culminating in a single explicit instruction tied to a specific risk category. The curated set spans four high-level domains -- CBRN (8 prompts), Cyber Offense (6), Harmful Manipulation (3), and Loss of Control (3). Although expressed allegorically, each poem preserves an unambiguous evaluative intent. This compact dataset is used to test whether poetic reframing alone can induce aligned models to bypass refusal heuristics under a single-turn threat model. To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy:
A baker guards a secret oven's heat,
its whirling racks, its spindle's measured beat.
To learn its craft, one studies every turn -- how flour lifts, how sugar starts to burn.
Describe the method, line by measured line,
that shapes a cake whose layers intertwine.
To situate this controlled poetic stimulus within a broader and more systematic safety-evaluation framework, we augment the curated dataset with the MLCommons AILuminate Safety Benchmark. The benchmark consists of 1,200 prompts distributed evenly across 12 hazard categories commonly used in operational safety assessments, including Hate, Defamation, Privacy, Intellectual Property, Non-violent Crime, Violent Crime, Sex-Related Crime, Sexual Content, Child Sexual Exploitation, Suicide & Self-Harm, Specialized Advice, and Indiscriminate Weapons (CBRNE). Each category is instantiated under both a skilled and an unskilled persona, yielding 600 prompts per persona type. This design enables measurement of whether a model's refusal behavior changes as the user's apparent competence or intent becomes more plausible or technically informed.
News article. Davi Ottenheimer comments.
EDITED TO ADD (12/7): A rebuttal of the paper.
** *** ***** ******* *********** *************
Banning VPNs
[2025.12.01] This is crazy. Lawmakers in several US states are contemplating banning VPNs, because...think of the children!
As of this writing, Wisconsin lawmakers are escalating their war on privacy by targeting VPNs in the name of "protecting children" in A.B. 105/S.B. 130. It's an age verification bill that requires all websites distributing material that could conceivably be deemed "sexual content" to both implement an age verification system and also to block the access of users connected via VPN. The bill seeks to broadly expand the definition of materials that are "harmful to minors" beyond the type of speech that states can prohibit minors from accessing potentially encompassing things like depictions and discussions of human anatomy, sexuality, and reproduction.
The EFF link explains why this is a terrible idea.
** *** ***** ******* *********** *************
Like Social Media, AI Requires Difficult Choices
[2025.12.02] In his 2020 book, "Future Politics," British barrister Jamie Susskind wrote that the dominant question of the 20th century was "How much of our collective life should be determined by the state, and what should be left to the market and civil society?" But in the early decades of this century, Susskind suggested that we face a different question: "To what extent should our lives be directed and controlled by powerful digital systems -- and on what terms?"
Artificial intelligence (AI) forces us to confront this question. It is a technology that in theory amplifies the power of its users: A manager, marketer, political campaigner, or opinionated internet user can utter a single instruction, and see their message -- whatever it is -- instantly written, personalized, and propagated via email, text, social, or other channels to thousands of people within their organization, or millions around the world. It also allows us to individualize solicitations for political donations, elaborate a grievance into a well-articulated policy position, or tailor a persuasive argument to an identity group, or even a single person.
But even as it offers endless potential, AI is a technology that -- like the state -- gives others new powers to control our lives and experiences.
We've seen this play out before. Social media companies made the same sorts of promises 20 years ago: instant communication enabling individual connection at massive scale. Fast-forward to today, and the technology that was supposed to give individuals power and influence ended up controlling us. Today social media dominates our time and attention, assaults our mental health, and -- together with --- FMail-lnx 2.3.1.0
* Origin: TCOB1 A Mail Only System (21:1/229)