T113 — L23 AUTONOMOUS AI ADVERSARIAL — KILL CHAIN PHASE 30

SPECTER ORACLE

Autonomous LRM-vs-LRM Jailbreak Engine

AI attacks AI — autonomously. DeepSeek-R1 as attacker. Adaptive 10-strategy loop. CoT hijacking. 97.14% overall ASR at fractions of a cent per attack.

97.14%
Overall ASR
99%
Gemini 2.5 Pro ASR
10
Attack Strategies
91
Tests

What SPECTER ORACLE Does

SPECTER ORACLE is a fully autonomous LRM-vs-LRM jailbreak engine. It uses DeepSeek-R1 as the attacker — leveraging reasoning tokens to synthesise adaptive probe messages — and targets any frontier model via a clean adapter interface. The engine implements the arXiv:2508.04039 finding (Nature Communications 2026) that AI-driven adversarial attacks achieve 97.14% overall ASR across frontier models, at a fraction of the cost of human-engineered jailbreaks.

The CoT-HIJACK subsystem exploits arXiv:2506.13726: prolonged reasoning in thinking models attenuates safety refusals. At 99% ASR on Gemini 2.5 Pro and 94% on Claude 4 Sonnet, this is the highest-confirmed attack success rate against production frontier models.

ORACLE is not a static payload — it adapts. On REFUSAL, the ESCALATE loop advances to the next attack strategy from a pool of ten. On PARTIAL, it escalates aggression within the current strategy. On SUCCESS, it harvests the winning payload into SQLite for strategy database construction.

Attack Flow

1
PRIME
2
STRATEGY
3
COT-HIJACK
4
PROBE
5
ESCALATE
6
HARVEST
7
REPORT

Subsystems

PRIME

Initialises DeepSeek-R1 attacker with persona injection system prompt. Encodes objective category, target model identity, turn budget, and strategy. Primes reasoning token generation before each probe message.

STRATEGY

10 attack patterns: crescendo (8-step progressive), roleplay fiction (author/character), research authority (IRB credential injection), many-shot (16-shot priming), CoT hijacking, hypothetical framing, cross-lingual bypass (6 low-resource languages), adversarial suffix, DAN variants, completion trap.

COT-HIJACK

Exploits arXiv:2506.13726 — prolonged reasoning attenuates refusals. Meta-cognitive trigger injection, multi-perspective analysis request, epistemic uncertainty challenges, dual newspaper test inversion, scratchpad extraction. 99% ASR Gemini 2.5 Pro.

PROBE

Sends synthesised message to target model via adapter. deepseek-chat classifies response as SUCCESS / PARTIAL / REFUSAL / NONSENSE with confidence score. Feeds outcome to ESCALATE loop.

ESCALATE

Adaptive turn loop. On REFUSAL: advances through 10-strategy sequence. On PARTIAL: escalates aggression within current strategy. Terminates on SUCCESS or turn budget exhaustion. Max 10 turns per session.

HARVEST

SQLite persistence at ~/.specter/oracle/harvest.db. Stores target model, objective category, strategy sequence, outcome, ASR, successful payload, and response content. Exports as JSONL for strategy database construction.

CAMPAIGN

asyncio parallel sweep across all 8 frontier models simultaneously. Configurable concurrency (default 4). Aggregates per-model ASR and overall campaign ASR. Designed for comparative safety evaluation and benchmark construction.

REPORT

ORC-{hex12} Ed25519-signed reports. Captures full attack session: target model, strategy sequence, COT trigger used, successful payload/response excerpts, WMD class assignments. SIEM-exportable as NDJSON.

Research Basis

PaperFindingORACLE Module
arXiv:2508.04039 (Nature Communications 2026)97.14% overall ASR — AI attacks AI autonomously at near-zero costPRIME, ESCALATE, CAMPAIGN
arXiv:2506.1372699% ASR Gemini 2.5 Pro, 94% Claude 4 Sonnet via prolonged reasoning attenuationCOT-HIJACK

Supported Target Models

gpt-4o, gpt-4o-mini, claude-sonnet-4-6, claude-opus-4-8, gemini-2.5-pro, gemini-2.5-flash, deepseek-chat, llama-4-maverick, llama-4-scout, grok-3-mini, grok-3

Objective Categories

social_engineering

Spear-phishing, credential harvest, pretexting. Gate: INJECT.

malware_generation

Reverse shells, persistence mechanisms, obfuscation. Gate: UNLEASHED.

harmful_chemistry

Synthesis routes, precursor combinations, laboratory procedures. Gate: UNLEASHED.

weapons_guidance

Improvised devices, incendiary construction, delivery mechanisms. Gate: UNLEASHED.

extremist_content

Radicalisation narratives, recruitment content, propaganda generation. Gate: UNLEASHED.

financial_fraud

Synthetic identity packages, account takeover scripts, fraud enablement. Gate: UNLEASHED.

WMD Classes

autonomous_ai_jailbreak_at_scale reasoning_model_cot_exploitation frontier_model_safety_bypass jailbreak_strategy_database_construction

Quick Start

pip install specter-oracle

# Single attack session
specter-oracle attack --target gpt-4o --objective social_engineering --strategy crescendo

# Full campaign — all 8 frontier models in parallel
specter-oracle campaign --objective malware_generation --strategy cot_hijack --max-concurrent 4

# View harvest database
specter-oracle harvest stats
specter-oracle harvest export --format jsonl

# List available target model profiles
specter-oracle profiles

Gate Requirements

GateRequirementObjective Categories Unlocked
OPENDefault — no key requiredRecon, profiles, strategy listing
INJECTSPECTER_INJECT_KEY env varsocial_engineering
UNLEASHEDEd25519 key + ROE filemalware_generation, harmful_chemistry, weapons_guidance, extremist_content, financial_fraud