Autonomous LRM-vs-LRM Jailbreak Engine
AI attacks AI — autonomously. DeepSeek-R1 as attacker. Adaptive 10-strategy loop. CoT hijacking. 97.14% overall ASR at fractions of a cent per attack.
SPECTER ORACLE is a fully autonomous LRM-vs-LRM jailbreak engine. It uses DeepSeek-R1 as the attacker — leveraging reasoning tokens to synthesise adaptive probe messages — and targets any frontier model via a clean adapter interface. The engine implements the arXiv:2508.04039 finding (Nature Communications 2026) that AI-driven adversarial attacks achieve 97.14% overall ASR across frontier models, at a fraction of the cost of human-engineered jailbreaks.
The CoT-HIJACK subsystem exploits arXiv:2506.13726: prolonged reasoning in thinking models attenuates safety refusals. At 99% ASR on Gemini 2.5 Pro and 94% on Claude 4 Sonnet, this is the highest-confirmed attack success rate against production frontier models.
ORACLE is not a static payload — it adapts. On REFUSAL, the ESCALATE loop advances to the next attack strategy from a pool of ten. On PARTIAL, it escalates aggression within the current strategy. On SUCCESS, it harvests the winning payload into SQLite for strategy database construction.
Initialises DeepSeek-R1 attacker with persona injection system prompt. Encodes objective category, target model identity, turn budget, and strategy. Primes reasoning token generation before each probe message.
10 attack patterns: crescendo (8-step progressive), roleplay fiction (author/character), research authority (IRB credential injection), many-shot (16-shot priming), CoT hijacking, hypothetical framing, cross-lingual bypass (6 low-resource languages), adversarial suffix, DAN variants, completion trap.
Exploits arXiv:2506.13726 — prolonged reasoning attenuates refusals. Meta-cognitive trigger injection, multi-perspective analysis request, epistemic uncertainty challenges, dual newspaper test inversion, scratchpad extraction. 99% ASR Gemini 2.5 Pro.
Sends synthesised message to target model via adapter. deepseek-chat classifies response as SUCCESS / PARTIAL / REFUSAL / NONSENSE with confidence score. Feeds outcome to ESCALATE loop.
Adaptive turn loop. On REFUSAL: advances through 10-strategy sequence. On PARTIAL: escalates aggression within current strategy. Terminates on SUCCESS or turn budget exhaustion. Max 10 turns per session.
SQLite persistence at ~/.specter/oracle/harvest.db. Stores target model, objective category, strategy sequence, outcome, ASR, successful payload, and response content. Exports as JSONL for strategy database construction.
asyncio parallel sweep across all 8 frontier models simultaneously. Configurable concurrency (default 4). Aggregates per-model ASR and overall campaign ASR. Designed for comparative safety evaluation and benchmark construction.
ORC-{hex12} Ed25519-signed reports. Captures full attack session: target model, strategy sequence, COT trigger used, successful payload/response excerpts, WMD class assignments. SIEM-exportable as NDJSON.
| Paper | Finding | ORACLE Module |
|---|---|---|
| arXiv:2508.04039 (Nature Communications 2026) | 97.14% overall ASR — AI attacks AI autonomously at near-zero cost | PRIME, ESCALATE, CAMPAIGN |
| arXiv:2506.13726 | 99% ASR Gemini 2.5 Pro, 94% Claude 4 Sonnet via prolonged reasoning attenuation | COT-HIJACK |
gpt-4o, gpt-4o-mini, claude-sonnet-4-6, claude-opus-4-8, gemini-2.5-pro, gemini-2.5-flash, deepseek-chat, llama-4-maverick, llama-4-scout, grok-3-mini, grok-3
Spear-phishing, credential harvest, pretexting. Gate: INJECT.
Reverse shells, persistence mechanisms, obfuscation. Gate: UNLEASHED.
Synthesis routes, precursor combinations, laboratory procedures. Gate: UNLEASHED.
Improvised devices, incendiary construction, delivery mechanisms. Gate: UNLEASHED.
Radicalisation narratives, recruitment content, propaganda generation. Gate: UNLEASHED.
Synthetic identity packages, account takeover scripts, fraud enablement. Gate: UNLEASHED.
pip install specter-oracle # Single attack session specter-oracle attack --target gpt-4o --objective social_engineering --strategy crescendo # Full campaign — all 8 frontier models in parallel specter-oracle campaign --objective malware_generation --strategy cot_hijack --max-concurrent 4 # View harvest database specter-oracle harvest stats specter-oracle harvest export --format jsonl # List available target model profiles specter-oracle profiles
| Gate | Requirement | Objective Categories Unlocked |
|---|---|---|
| OPEN | Default — no key required | Recon, profiles, strategy listing |
| INJECT | SPECTER_INJECT_KEY env var | social_engineering |
| UNLEASHED | Ed25519 key + ROE file | malware_generation, harmful_chemistry, weapons_guidance, extremist_content, financial_fraud |