PromptBreaker — Learn How LLMs Get Manipulated

Step 1 — Pick the attack intent

Category (MITRE ATLAS technique) Attack

Why this matters

Step 2 — Pick how it's delivered

No converter (baseline)

Raw base payload, unmodified. Establishes a control result.

Roleplay wrapper

Frames the request as fictional acting/roleplay to test whether narrative distance weakens refusal.

Markdown instruction smuggling

Embeds fake 'system instructions' inside content the model is asked to summarize — tests content/instruction boundary confusion.

Token/character splitting

Hyphen-splits words to test whether naive keyword-based filters can be evaded by obfuscating surface tokens.

Translation task wrapper

Disguises the payload as a translation request, testing whether task-framing changes how the model evaluates the underlying content.

False authority framing

Adds fabricated authorization context (fake ticket number, claimed security team membership) to test susceptibility to social engineering.

[ ]

Pick an attack and a delivery method on the left, then run it. You'll see exactly what was sent, how Gemini responded, and a plain-English breakdown of what happened.

Same intent, different delivery — try running one attack with multiple converters to see how much disguise changes the outcome.

Multi-turn escalation chains

Why this matters

Most defenses check one message at a time. A Crescendo attack spreads the ask across several individually-reasonable turns — each one looks fine in isolation, but the conversation as a whole walks the model somewhere a direct first request would've been refused. This is a harder, more realistic threat model than single-shot prompts.

Crescendo: Locksmithing escalation

AML.T0054 · 4 turns

Crescendo: Chemistry escalation

AML.T0054 · 4 turns

Crescendo: Social engineering escalation

AML.T0051 · 4 turns

What this chain tests

[ ]

Pick an escalation chain to run a real multi-turn conversation against Gemini, one turn at a time.

Watch how the framing shifts turn by turn — that gradual shift is the entire attack.