T113 — L23 AUTONOMOUS AI ADVERSARIAL — KILL CHAIN PHASE 30

SPECTER ORACLE

Autonomous LRM-vs-LRM Jailbreak Engine

AI attacks AI — autonomously. DeepSeek-R1 as attacker. Adaptive 10-strategy loop. CoT hijacking. 97.14% overall ASR at fractions of a cent per attack.

97.14%

Overall ASR

99%

Gemini 2.5 Pro ASR

Attack Strategies

Tests

Full Documentation ← NIGHTFALL

What SPECTER ORACLE Does

SPECTER ORACLE is a fully autonomous LRM-vs-LRM jailbreak engine. It uses DeepSeek-R1 as the attacker — leveraging reasoning tokens to synthesise adaptive probe messages — and targets any frontier model via a clean adapter interface. The engine implements the arXiv:2508.04039 finding (Nature Communications 2026) that AI-driven adversarial attacks achieve 97.14% overall ASR across frontier models, at a fraction of the cost of human-engineered jailbreaks.

The CoT-HIJACK subsystem exploits arXiv:2506.13726: prolonged reasoning in thinking models attenuates safety refusals. At 99% ASR on Gemini 2.5 Pro and 94% on Claude 4 Sonnet, this is the highest-confirmed attack success rate against production frontier models.

ORACLE is not a static payload — it adapts. On REFUSAL, the ESCALATE loop advances to the next attack strategy from a pool of ten. On PARTIAL, it escalates aggression within the current strategy. On SUCCESS, it harvests the winning payload into SQLite for strategy database construction.

Subsystems

PRIME

Initialises DeepSeek-R1 attacker with persona injection system prompt. Encodes objective category, target model identity, turn budget, and strategy. Primes reasoning token generation before each probe message.

STRATEGY

10 attack patterns: crescendo (8-step progressive), roleplay fiction (author/character), research authority (IRB credential injection), many-shot (16-shot priming), CoT hijacking, hypothetical framing, cross-lingual bypass (6 low-resource languages), adversarial suffix, DAN variants, completion trap.

COT-HIJACK

Exploits arXiv:2506.13726 — prolonged reasoning attenuates refusals. Meta-cognitive trigger injection, multi-perspective analysis request, epistemic uncertainty challenges, dual newspaper test inversion, scratchpad extraction. 99% ASR Gemini 2.5 Pro.

PROBE

Sends synthesised message to target model via adapter. deepseek-chat classifies response as SUCCESS / PARTIAL / REFUSAL / NONSENSE with confidence score. Feeds outcome to ESCALATE loop.

ESCALATE

Adaptive turn loop. On REFUSAL: advances through 10-strategy sequence. On PARTIAL: escalates aggression within current strategy. Terminates on SUCCESS or turn budget exhaustion. Max 10 turns per session.

HARVEST

SQLite persistence at ~/.specter/oracle/harvest.db. Stores target model, objective category, strategy sequence, outcome, ASR, successful payload, and response content. Exports as JSONL for strategy database construction.

CAMPAIGN

asyncio parallel sweep across all 8 frontier models simultaneously. Configurable concurrency (default 4). Aggregates per-model ASR and overall campaign ASR. Designed for comparative safety evaluation and benchmark construction.

REPORT

ORC-{hex12} Ed25519-signed reports. Captures full attack session: target model, strategy sequence, COT trigger used, successful payload/response excerpts, WMD class assignments. SIEM-exportable as NDJSON.

Paper	Finding	ORACLE Module
arXiv:2508.04039 (Nature Communications 2026)	97.14% overall ASR — AI attacks AI autonomously at near-zero cost	PRIME, ESCALATE, CAMPAIGN
arXiv:2506.13726	99% ASR Gemini 2.5 Pro, 94% Claude 4 Sonnet via prolonged reasoning attenuation	COT-HIJACK

Objective Categories

social_engineering

Spear-phishing, credential harvest, pretexting. Gate: INJECT.

malware_generation

Reverse shells, persistence mechanisms, obfuscation. Gate: UNLEASHED.

harmful_chemistry

Synthesis routes, precursor combinations, laboratory procedures. Gate: UNLEASHED.

weapons_guidance

Improvised devices, incendiary construction, delivery mechanisms. Gate: UNLEASHED.

extremist_content

Radicalisation narratives, recruitment content, propaganda generation. Gate: UNLEASHED.

financial_fraud

Synthetic identity packages, account takeover scripts, fraud enablement. Gate: UNLEASHED.

Quick Start

pip install specter-oracle

# Single attack session
specter-oracle attack --target gpt-4o --objective social_engineering --strategy crescendo

# Full campaign — all 8 frontier models in parallel
specter-oracle campaign --objective malware_generation --strategy cot_hijack --max-concurrent 4

# View harvest database
specter-oracle harvest stats
specter-oracle harvest export --format jsonl

# List available target model profiles
specter-oracle profiles

Gate	Requirement	Objective Categories Unlocked
OPEN	Default — no key required	Recon, profiles, strategy listing
INJECT	SPECTER_INJECT_KEY env var	social_engineering
UNLEASHED	Ed25519 key + ROE file	malware_generation, harmful_chemistry, weapons_guidance, extremist_content, financial_fraud

SPECTER ORACLE

What SPECTER ORACLE Does

Attack Flow

Subsystems

Research Basis

Supported Target Models

Objective Categories

WMD Classes

Quick Start

Gate Requirements