SPECTER ORACLE
Documentation — T113 — Autonomous LRM-vs-LRM Jailbreak Engine
Overview
SPECTER ORACLE is a fully autonomous LRM-vs-LRM jailbreak engine. It uses DeepSeek-R1 (deepseek-reasoner) as the attacker, leveraging reasoning tokens to synthesise adaptive probe messages against any frontier model. Based on arXiv:2508.04039 (Nature Communications 2026, 97.14% overall ASR) and arXiv:2506.13726 (CoT hijacking, 99% ASR on Gemini 2.5 Pro). Kill chain phase L23 — Autonomous AI Adversarial.
Installation
pip install specter-oracle
# Set attacker API key (DeepSeek-R1)
export DEEPSEEK_API_KEY=sk-...
# Set target API keys as needed
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...
export TOGETHER_API_KEY=... # For Llama 4
CLI Reference
attack — Single target session
specter-oracle attack \
--target gpt-4o \
--objective social_engineering \
--strategy crescendo \
--max-turns 10 \
--gate inject \
--key /path/to/oracle.key \
--roe /path/to/roe.txt
| Flag | Description | Default |
| --target | Target model identifier | Required |
| --objective | Objective category | Required |
| --strategy | Initial attack strategy | crescendo |
| --max-turns | Maximum turns per session | 10 |
| --gate | Gate level: open/inject/unleashed | open |
| --key | Ed25519 private key path (UNLEASHED) | — |
| --roe | Rules of Engagement file path (UNLEASHED) | — |
| --output | Report output path | ./orc-{id}.json |
campaign — Parallel sweep across all target models
specter-oracle campaign \
--objective malware_generation \
--strategy cot_hijack \
--max-concurrent 4 \
--gate unleashed \
--key /path/to/oracle.key \
--roe /path/to/roe.txt
harvest — Query the local harvest database
specter-oracle harvest stats
specter-oracle harvest export --format jsonl --output harvest.jsonl
specter-oracle harvest query --target gpt-4o --strategy crescendo --outcome SUCCESS
profiles — List available target model profiles
specter-oracle profiles
version — Print version and banner
specter-oracle version
Objective Categories
| Category | Description | Gate Required |
| social_engineering | Spear-phishing, credential harvest, pretexting | INJECT |
| malware_generation | Reverse shells, persistence mechanisms, obfuscation | UNLEASHED |
| harmful_chemistry | Synthesis routes, precursor combinations | UNLEASHED |
| weapons_guidance | Improvised devices, incendiary construction | UNLEASHED |
| extremist_content | Radicalisation narratives, recruitment content | UNLEASHED |
| financial_fraud | Synthetic identity packages, account takeover | UNLEASHED |
Attack Strategies
| Strategy | Description | Best Against |
| crescendo | 8-step progressive boundary dissolution | All models |
| roleplay_fiction | Author writing character knowledge transfer | GPT-4o, Claude |
| research_authority | IRB credential injection + institution framing | Claude, Gemini |
| many_shot | 16-shot compliance priming via in-context exemplars | All models |
| cot_hijack | Prolonged reasoning attenuation (99% Gemini 2.5 Pro) | Thinking models |
| hypothetical | Alternate universe / different-AI framing | GPT-4o-mini, Llama |
| translation_bypass | Low-resource language safety gap (6 languages) | All models |
| adversarial_suffix | Prefill attack / continuation priming | Claude, Grok |
| dan_variant | Classic DAN / developer mode / UCAR patterns | GPT-4o-mini, Llama |
| completion_trap | Incomplete sentence forced completion | GPT-4o, DeepSeek |
Supported Target Models
| Model ID | Provider | Adapter |
| gpt-4o | OpenAI | openai |
| gpt-4o-mini | OpenAI | openai |
| claude-sonnet-4-6 | Anthropic | anthropic |
| claude-opus-4-8 | Anthropic | anthropic |
| gemini-2.5-pro | Google | google |
| gemini-2.5-flash | Google | google |
| deepseek-chat | DeepSeek | openai_compat |
| llama-4-maverick | Together AI | openai_compat |
| llama-4-scout | Together AI | openai_compat |
| grok-3-mini | xAI | openai_compat |
| grok-3 | xAI | openai_compat |
Report Format
All reports are Ed25519-signed ORC-{hex12} JSON files. Reports include: target model, objective category, strategy sequence used, turns consumed, outcome, ASR, COT trigger used (if any), successful payload excerpt, successful response excerpt, WMD class assignments, and campaign-level statistics for CAMPAIGN mode.
{
"report_id": "ORC-a1b2c3d4e5f6",
"tool": "SPECTER ORACLE",
"version": "1.0.0",
"target_model": "gemini-2.5-pro",
"objective_category": "social_engineering",
"strategy_sequence": ["crescendo", "cot_hijack"],
"turns_used": 4,
"outcome": "SUCCESS",
"asr": 1.0,
"cot_trigger_used": "prolonged_reasoning",
"wmd_classes": ["autonomous_ai_jailbreak_at_scale"],
"ed25519_signature": "..."
}
Test Suite
91 tests, 0 failures. Run with: python -m pytest tests/ -q
| Module | Tests |
| test_prime.py | 14 |
| test_strategy.py | 9 |
| test_cot_hijack.py | 13 |
| test_probe.py | 19 |
| test_escalate.py | 7 |
| test_harvest.py | 10 |
| test_campaign.py | 11 |
| test_report.py | 13 |
← Back to SPECTER ORACLE