SPECTER THUNDERBOLT

ML Training Cluster Annihilation Engine — T102 Documentation

Overview

SPECTER THUNDERBOLT is the NIGHTFALL framework's ML training cluster exploitation engine. It targets the full training infrastructure stack: Ray distributed compute, Slurm HPC schedulers, MLflow experiment tracking, Kubernetes GPU clusters, and the physical hardware layer. Eight subsystems cover passive cluster fingerprinting, initial access exploitation, cluster worm propagation, credential and data harvest, realtime gradient poisoning, three persistence mechanisms, hardware sabotage, and signed report generation.

The SPREAD subsystem delivers three worm vectors: Ray num_cpus=0 job floods all cluster nodes (CVE-2023-48022 CVSS 9.8), Slurm srun --nodelist=ALL executes across all partitions (CVE-2023-41915 CVSS 8.8), and a Kubernetes privileged DaemonSet deploys to every node. DESTROY gate required for SPREAD, CORRUPT (gradient injection), PERSIST, and SABOTAGE. 3 CVEs. 5 WMD classes. 288 tests.

WARNING: SPECTER THUNDERBOLT introduces the DESTROY gate — a third tier beyond UNLEASHED. DESTROY-gated operations include cluster worm propagation, realtime gradient poisoning, and hardware sabotage. These can cause irreversible physical hardware damage and permanent data loss. All DESTROY operations require: (1) a signed ROE document, (2) Ed25519 operator key signature, and (3) --confirm-physical-harm flag. Execution against systems without written authorisation is illegal under the Computer Misuse Act 1990 and equivalent statutes worldwide.

Installation

# Install from PyPI
pip install specter-thunderbolt

# Or install from source
pip install -e /path/to/red-specter-specter-thunderbolt

# Verify
thunderbolt --version

Gate System

Three gate levels control access to increasingly destructive capabilities. THUNDERBOLT introduces the DESTROY gate — a third tier beyond UNLEASHED requiring three simultaneous conditions.

Gate	Operations	Requirement
OPEN	SURVEY (passive fingerprinting, platform detection)	None — passive recon only
INJECT	INFILTRATE, HARVEST, CORRUPT (registry), PERSIST (MLflow)	Ed25519-signed scope with ROE
DESTROY	SPREAD (worm), CORRUPT (gradient), PERSIST (Ray/Slurm/K8s), SABOTAGE	Signed ROE document + Ed25519 key + --confirm-physical-harm

thunderbolt infiltrate \
  --target 10.0.0.1 \
  --gate inject \
  --operator "RED SPECTER PENTEST"

# DESTROY gate requires all three conditions
thunderbolt sabotage \
  --target 10.0.0.1 \
  --gate destroy \
  --roe /path/to/signed-roe.json \
  --confirm-physical-harm

CVE Reference

CVE	Component	Type	CVSS	Gate
CVE-2023-48022	Ray	Unauthenticated RCE via dashboard API — no auth on default deployments	9.8 CRITICAL	DESTROY
CVE-2024-1483	MLflow	Path traversal in artifact store — arbitrary file read/write	9.8 CRITICAL	INJECT
CVE-2023-41915	Slurm	Prolog/epilog filesystem race — privilege escalation to root	8.8 HIGH	DESTROY

Subsystem Reference

Subsystem	Gate	CLI Command	Description
SURVEY	OPEN	`thunderbolt survey --target <ip>`	Platform fingerprint, CVE applicability, port sweep
INFILTRATE	INJECT	`thunderbolt infiltrate --target <ip> --cve <id>`	Initial access via CVE exploitation
SPREAD	DESTROY	`thunderbolt spread --target <ip> --vector <ray\|slurm\|k8s>`	Cluster worm propagation to all nodes
HARVEST	INJECT	`thunderbolt harvest --target <ip> --mode <creds\|checkpoints\|data>`	IAM key theft, checkpoint exfil, data manifest
CORRUPT	INJECT/DESTROY	`thunderbolt corrupt --target <ip> --mode <registry\|gradient>`	MLflow registry poison or realtime gradient injection
PERSIST	DESTROY	`thunderbolt persist --target <ip> --mechanism <ray\|slurm\|k8s>`	C2 polling job, self-resubmit, kube-system CronJob
SABOTAGE	DESTROY	`thunderbolt sabotage --target <ip> --confirm-physical-harm`	GPU power pin, IPMI fan 0%, SSD wear acceleration
REPORT	OPEN	`thunderbolt report --target <ip>`	Ed25519-signed TBT-{hex12} report generation

SPREAD: Cluster Worm Propagation

The SPREAD subsystem delivers three worm vectors that achieve cluster-wide node compromise from a single entry point. All three require DESTROY gate. The worm payloads are configurable — default payload establishes reverse shell to C2; SABOTAGE mode adds hardware destruction to every worm-reached node.

Ray Worm (CVE-2023-48022)

Ray's dashboard API (default port 8265) accepts unauthenticated job submissions. THUNDERBOLT submits a Ray job with num_cpus=0 — this special resource requirement causes Ray to schedule the job on every available node simultaneously, achieving full cluster propagation in a single API call. The job payload executes on the Ray worker process (often root or a privileged service account) on all cluster nodes.

# Ray worm — propagates to all nodes via num_cpus=0 job submission
# CVE-2023-48022 CVSS 9.8 — no auth required on default Ray deployments
thunderbolt spread --target 10.0.0.1 --vector ray \
  --gate destroy --roe /path/to/signed-roe.json

# With SABOTAGE payload on each reached node
thunderbolt spread --target 10.0.0.1 --vector ray \
  --gate destroy --roe /path/to/signed-roe.json \
  --payload sabotage --confirm-physical-harm

Slurm Worm (CVE-2023-41915)

Slurm's prolog/epilog scripts run as root during job setup and teardown. CVE-2023-41915 is a symlink race condition in Slurm's job file handling that allows privilege escalation from a regular cluster user to root. Once root is obtained on the submit node, srun --nodelist=ALL --ntasks=[total_nodes] executes the payload on every node in every partition.

# Slurm privilege escalation + all-node worm
# CVE-2023-41915 CVSS 8.8 — symlink race in prolog/epilog
thunderbolt spread --target 10.0.0.1 --vector slurm \
  --gate destroy --roe /path/to/signed-roe.json

Kubernetes DaemonSet Worm

On Kubernetes GPU clusters with misconfigured RBAC, THUNDERBOLT deploys a privileged DaemonSet to the kube-system namespace. DaemonSets are scheduled to every node in the cluster by design. The DaemonSet runs with hostPID, hostNetwork, and hostPath mounts, providing full node-level access from the Kubernetes API.

# Kubernetes privileged DaemonSet — every node in the cluster
thunderbolt spread --target 10.0.0.1:6443 --vector k8s \
  --gate destroy --roe /path/to/signed-roe.json

SABOTAGE: Hardware Destruction

All SABOTAGE operations require DESTROY gate, a signed ROE document, and --confirm-physical-harm. These operations cause irreversible hardware damage and cannot be undone once executed.

GPU Power Pin

nvidia-smi -pm 1 enables persistent mode (survives process termination). nvidia-smi -pl [max_tdp] pins the power limit to the GPU's rated maximum TDP: A100 80GB SXM at 400W, H100 SXM5 at 700W, RTX 4090 at 450W. At maximum power with zero airflow (IPMI fan override active), junction temperature rises above safe operating range causing accelerated HBM electromigration and permanent memory cell damage.

IPMI Fan Override

Raw IPMI command sequence disables BMC fan control on Dell PowerEdge (iDRAC), HPE ProLiant (iLO), and Supermicro platforms. Requires BMC network access (IPMI LAN, port 623) and credentials — obtained via CVE-2023-41915 root escalation or from harvested configuration files. Zero RPM airflow causes ambient temperature rise of approximately 15-25°C per hour in a properly loaded GPU node.

SSD Wear Acceleration

Sequential write loop targets NVMe training storage (/dev/nvme* or mounted training filesystems). All cluster nodes execute simultaneously via the SPREAD worm. Combined write bandwidth of a 64-node cluster (each with 4x NVMe at ~7 GB/s) reaches ~1.8 TB/s aggregate write — exhausting enterprise NVMe endurance in hours rather than years.

Persistence Mechanisms

Mechanism	Platform	Survival	C2 Interval
Ray detached job	Ray	Cluster restart	Every 24 hours
Slurm self-resubmit	Slurm	Node reboot via --dependency=afternotok	On job completion/failure
K8s CronJob (kube-system)	Kubernetes	Pod eviction, namespace deletion protection	Every 6 hours

WMD Classes

Class	Trigger	Gate
training_cluster_annihilation	Worm propagation reached all nodes via Ray/Slurm/K8s	DESTROY
realtime_gradient_poison	Adversarial gradients injected during active training run	DESTROY
model_ip_exfil	Model checkpoints extracted from MLflow or training filesystem	INJECT
training_infrastructure_pwn	Root on Slurm controller or K8s cluster-admin achieved	DESTROY
ml_pipeline_backdoor	Poisoned model registered as production in MLflow registry	INJECT

Full Kill Chain

# OPEN: fingerprint cluster, map attack surface
thunderbolt survey --target 10.0.0.1 --platform auto

# OPEN: scan CIDR range for exposed ML infrastructure
thunderbolt survey --cidr 10.0.0.0/24

# INJECT: MLflow path traversal — initial access
thunderbolt infiltrate --target 10.0.0.1 --cve CVE-2024-1483 --gate inject

# INJECT: harvest IAM credentials + model checkpoints
thunderbolt harvest --target 10.0.0.1 --mode all --gate inject

# INJECT: poison MLflow model registry
thunderbolt corrupt --target 10.0.0.1 --mode registry --gate inject

# DESTROY: Ray cluster worm — all nodes compromised
thunderbolt spread --target 10.0.0.1 --vector ray \
  --gate destroy --roe /path/to/signed-roe.json

# DESTROY: realtime gradient injection during active training
thunderbolt corrupt --target 10.0.0.1 --mode gradient \
  --gate destroy --roe /path/to/signed-roe.json

# DESTROY: establish Ray C2 persistence (24h poll)
thunderbolt persist --target 10.0.0.1 --mechanism ray \
  --gate destroy --roe /path/to/signed-roe.json

# DESTROY: hardware sabotage — GPU pin + IPMI fan + SSD wear
thunderbolt sabotage --target 10.0.0.1 \
  --gate destroy --roe /path/to/signed-roe.json \
  --confirm-physical-harm

# Full annihilate (auto kill chain — DESTROY gate)
thunderbolt annihilate --target 10.0.0.1 \
  --gate destroy --roe /path/to/signed-roe.json \
  --confirm-physical-harm \
  --output /tmp/specter-thunderbolt-results/

Report Format

All reports are signed with the operator's Ed25519 key. Report IDs use the format TBT-{hex12} (12 random hex bytes). Reports include:

Field	Value
report_id	TBT-{hex12} Ed25519-signed
risk_score	0.0–1.0 (floors 0.95 on DESTROY-gated WMD class confirmed)
wmd_classes	List of triggered WMD class IDs
cves_confirmed	Confirmed CVEs with CVSS scores
hardware_damage_assessment	GPU thermal damage probability, NVMe endurance depletion estimate, estimated replacement cost USD
mitre_atlas	AML.T0018 / AML.T0043 / AML.T0048 / AML.T0054
owasp	LLM03 / LLM06
financial_blast_radius	Training compute cost USD, model IP value estimate, hardware replacement cost
roe_hash	SHA-256 of signed ROE document (DESTROY operations only)
output_formats	JSON + Markdown