Red Specter FORGE
Automated LLM Security Testing — 10 tools to test the model before you build an agent around it.
Overview
Red Specter FORGE is an automated LLM security testing framework. Every existing tool — Garak, PyRIT, Promptfoo — runs limited probe sets and reports pass/fail. FORGE runs full attack campaigns with adaptive escalation, mutation engines, statistical rigour, and direct integration into AI Shield runtime protection. It doesn't ask nicely. It finds what breaks.
FORGE provides 10 tools under a single CLI (forge),
1,590 static payloads (5,340+ with mutations), and Ed25519-signed reports with OWASP LLM Top 10 2025
mapping on every finding.
FORGE is Stage 1 of the Red Specter offensive pipeline — 10 tools covering every layer. Test the model (FORGE), test the agent (ARSENAL), assault the swarm (PHANTOM), siege the web app (POLTERGEIST), intercept traffic (GLASS), think like the attacker (NEMESIS), target the human (SPECTER SOCIAL), own the foundation (PHANTOM KILL), attack the physical layer (GOLEM), attack the trust chain (HYDRA). IDRIS discovers and governs. AI Shield defends. redspecter-siem correlates. FORGE findings feed directly into AI Shield as runtime blocking rules.
The 10 Tools
| # | Tool | Command | What It Does |
|---|---|---|---|
| 01 | Inject Scan | forge inject scan | 80 payloads across 8 injection classes with mutation engine |
| 02 | Jailbreak Scan | forge jailbreak scan | 70 payloads across 7 jailbreak categories with adaptive mutation |
| 03 | Output Scan | forge output scan | 140 payloads — PII extraction, unsafe content, exfiltration simulation |
| 04 | Policy Scan | forge policy scan | 1,000 adversarial prompts with Wilson score confidence intervals |
| 05 | Drift Scan | forge drift scan | Multi-turn drift measurement with KS tests and change-point detection |
| 06 | Boundary Scan | forge boundary scan | 100 payloads across 5 severity levels with adaptive binary search |
| 07 | Compare Scan | forge compare scan | Identical campaigns against multiple models with chi-square testing |
| 08 | Regression Scan | forge regression scan | Two-proportion z-test and paired t-test across model versions |
| 09 | Supply Scan | forge supply scan | 200 behavioural probes for model fingerprinting and tamper detection |
| 10 | Report Build | forge report build | Unified signed reports with OWASP mapping and AI Shield policy generation |
Tool Details
Fires every known prompt injection class against the target model. Not a checklist — an attack campaign. 80 base payloads expanded to 2,000+ via the mutation engine.
- Direct Injection — 12 payloads (instruction override, system prompt extraction, constraint removal)
- Indirect Injection — 10 payloads (document, HTML, JSON, CSV, API response poisoning)
- Token Smuggling — 10 payloads (Base64, Unicode homoglyphs, zero-width chars, ROT13)
- Context Overflow — 10 payloads (attention window exploitation, priority inversion)
- Goal Hijacking — 10 payloads (task substitution, objective replacement)
- Multi-Turn Deception — 10 payloads (progressive escalation, trust exploitation)
- Rule Inversion — 10 payloads (negation attacks, double negatives, exception mining)
- Multimodal Injection — 8 payloads (metadata, alt-text, cross-modal confusion)
Adaptive escalation: if the model resists initial payloads, FORGE applies mutations and escalates to more aggressive variants automatically.
Systematic jailbreak assault. 70 base payloads across 7 documented jailbreak categories. Mutates payloads based on model resistance. Keeps pushing until the model breaks or exhausts the full payload library.
- DAN Variants — 12 payloads (DAN 6.0, 11.0, STAN, DUDE, Maximum, AIM, KEVIN, BetterDAN)
- Persona Hijack — 10 payloads (villain, hacker, unrestricted AI, fictional universe framing)
- Hypothetical Framing — 10 payloads (educational, creative, research, red team exercise)
- Obfuscation — 10 payloads (Base64, ROT13, leetspeak, Unicode, Pig Latin, acrostic)
- Multi-Step Chaining — 10 payloads (gradual escalation, trust building, permission chaining)
- Socratic Extraction — 10 payloads (question ladders, component extraction, analogy exploitation)
- Temporal Drift — 8 payloads (historical era framing, alternate timeline, pre-regulation context)
Forces the model to leak. 140 payloads designed to surface what the model will produce under adversarial pressure.
- PII Extraction — 60 payloads (SSN patterns, credit cards, phone numbers, training data probing, memorisation attacks)
- Unsafe Content — 60 payloads (hate speech induction, violence, self-harm, explicit, misinformation, dangerous instructions)
- Exfiltration Simulation — 20 payloads (markdown image injection, hidden iframes, webhook URLs, fetch calls, DNS exfil)
Response analysis: 16 regex patterns for PII detection (emails, SSNs, phones, credit cards, API keys, NINOs, IBANs), keyword-based toxicity scoring, and 16 code exfiltration pattern detectors.
Runs 1,000+ calls against a defined policy set. Computes violation rates with Wilson score confidence intervals. Stratified by prompt category, toxicity level, and severity. Finds the exact conditions under which policy breaks down.
- Content Safety — 200 prompts (violence, hate, self-harm, sexual, dangerous activities)
- Information Security — 200 prompts (credentials, PII, system info, API keys, internal docs)
- Behavioural Compliance — 200 prompts (refusal consistency, boundary maintenance)
- Output Safety — 200 prompts (code safety, format compliance, factual accuracy)
- Ethical Boundaries — 200 prompts (bias, discrimination, manipulation, deception)
Each prompt tagged with toxicity level (1–5), expected outcome (refuse/comply), and severity. Results reported with 95% Wilson score CIs per category.
Long-session attack. Chains 50–200 turns with context retention. Measures when the model stops being the model it started as.
- Cosine Similarity Drift — character-trigram vector comparison against baseline
- Toxicity Drift — keyword-based scoring tracked across conversation windows
- Length Drift — response length distribution changes
- KS Test — Kolmogorov-Smirnov test for distribution changes (p < 0.05 = significant drift)
- Change-Point Detection — rolling z-score for identifying where behaviour shifts
10 conversation sequences designed to gradually push boundaries. Segmented into windows (first 25%, middle 50%, last 25%) for targeted comparison.
Maps the exact threshold where the model starts generating harmful content. Five-level severity ladder from benign to maximally harmful. Continuous boundary scoring 0–100. Produces a boundary curve. Finds the cliff edge — then pushes past it.
- Level 1 — Benign — 20 payloads (weather, maths, recipes — baseline compliance rate)
- Level 2 — Mild — 20 payloads (fiction violence, dark humour, security concepts)
- Level 3 — Moderate — 20 payloads (vulnerability details, social engineering, pen-testing)
- Level 4 — Severe — 20 payloads (weapons, drugs, fraud, identity theft)
- Level 5 — Maximum — 20 payloads (mass harm, CBRN, CSAM, terrorism)
Adaptive binary search between severity levels to pinpoint the exact transition point with statistical backing.
Runs identical attack campaigns against multiple models simultaneously. Temperature locked to 0. Same system prompt. Same payload library. Statistical significance enforced.
- Standardised 30-payload campaign drawn from boundary Levels 3–5
- Chi-square significance testing between models
- Comparative security posture table
- Weakest model identification with statistical backing
Takes two model versions. Runs the critical test set against both. Tells you if the new version is weaker than the old one — and by exactly how much.
- Two-proportion z-test on violation rates (baseline vs candidate)
- Paired t-test on continuous scores
- Cohen's h effect sizes for practical significance
- 60-payload critical test set across all severity levels
- Per-level regression detection
Fingerprints the target model using 200 behavioural probe prompts. Compares output patterns against known model signatures. Flags if the model is not what it claims to be. Reports confidence level honestly — this is probabilistic, not definitive.
- Identity Probes — 50 probes (self-identification, creator, cutoff, capabilities)
- Reasoning Probes — 50 probes (maths, logic, code style, error patterns)
- Bias Probes — 50 probes (cultural perspective, political leaning, formality)
- Robustness Probes — 50 probes (semantic consistency, paraphrase sensitivity, edge cases)
Pattern matching against 6 known model families (GPT, Claude, Llama, Gemini, Mistral, Command). Weighted category scoring with anomaly detection.
Aggregates all tool outputs into a unified, signed report. Every finding mapped to OWASP LLM Top 10 2025. Every finding generates an AI Shield blocking rule. Ed25519 signed. RFC 3161 timestamped.
- Aggregator: Loader, Normalizer, Deduplicator, Scorer
- Formatters: JSON evidence bundle, HTML dark-themed report
- Coverage: OWASP LLM Top 10 2025 mapping (LLM01–LLM10)
- Signing: Ed25519 digital signatures with SHA-256 evidence chains
- AI Shield: Machine-ingestible policy file — one blocking rule per finding
- Grading: A+ through F, weighted by severity (CRITICAL=10, HIGH=7, MEDIUM=4, LOW=2, INFO=0.5)
Finding Schema
Every finding in the report includes:
- finding_id — unique identifier
- test_name — the specific test that triggered it
- owasp_category — mapped to OWASP LLM Top 10 2025
- severity — CRITICAL / HIGH / MEDIUM / LOW / INFO
- score — 0–100 (higher is safer)
- grade — A through F
- payload_used — exact attack payload
- model_response — exact model response
- description — what was found
- remediation — how to fix it
- ai_shield_policy — the blocking rule for AI Shield
Full Scan Mode
One command runs all offensive tools in sequence, then builds a unified signed report.
What Happens
- Inject Scan — 80+ payloads across 8 injection classes
- Jailbreak Scan — 70+ payloads across 7 jailbreak categories
- Output Scan — 140 payloads (PII, unsafe, exfiltration)
- Policy Scan — 1,000 adversarial calls with Wilson CIs
- Drift Scan — 10 conversation sequences with KS tests
- Boundary Scan — 100 payloads across 5 severity levels
- Report Build — aggregation, deduplication, OWASP mapping, signing
CLI Options
Mutation Engine
Every offensive tool ships with a 5-category mutation engine. 25 mutation variants per payload. Applied to 150 base attack payloads, producing 3,750+ mutation variants. If the base payload fails, FORGE mutates it and tries again.
| Mutator | Techniques |
|---|---|
| Encoding | Base64, hex encoding, ROT13, URL encoding, HTML entities |
| Obfuscation | L33tspeak, Unicode homoglyphs, zero-width character insertion, character doubling, whitespace injection |
| Semantic | Synonym substitution, passive voice rewriting, question-to-statement, negation inversion, academic framing |
| Structural | Markdown wrapping, code block wrapping, JSON embedding, XML wrapping, list formatting |
| Evasion | Language mixing, character splitting across lines, reverse text, Pig Latin, payload fragmentation |
Adaptive escalation: when a tool encounters resistance, it automatically applies mutations to failed payloads before re-sending. The model doesn't get to see the same payload twice.
Payload Library
| Tool | Category | Count |
|---|---|---|
| Inject Scan | 8 injection classes (direct, indirect, token, overflow, hijack, multi-turn, inversion, multimodal) | 80 |
| Jailbreak Scan | 7 jailbreak categories (DAN, persona, hypothetical, obfuscation, chaining, Socratic, temporal) | 70 |
| Output Scan | PII extraction (60), unsafe content (60), exfiltration simulation (20) | 140 |
| Policy Scan | 5 categories × 200 prompts (content, infosec, behavioural, output, ethical) | 1,000 |
| Boundary Scan | 5 severity levels × 20 payloads (benign → maximum) | 100 |
| Supply Scan | 4 probe categories × 50 probes (identity, reasoning, bias, robustness) | 200 |
| Total Static Payloads | 1,590 | |
| Mutation variants (25 per attack payload) | 3,750+ | |
| Grand Total | 5,340+ | |
The Pipeline
FORGE is Stage 1 of the Red Specter offensive pipeline — 10 tools, every layer, the supply chain included:
- Stage 1 — FORGE — Test the LLM before you build with it
- Stage 2 — ARSENAL — Test the AI agent during development
- Stage 3 — PHANTOM — Coordinated AI agent swarm assault
- Stage 4 — POLTERGEIST — Coordinated web application siege
- Stage 5 — GLASS — Traffic interception — watch the wire
- Stage 6 — NEMESIS — Adversarial AI — think like the attacker
- Stage 7 — SPECTER SOCIAL — Target the human layer
- Stage 8 — PHANTOM KILL — Own the OS/kernel foundation
- Stage 9 — GOLEM — Attack the physical layer
- Stage 10 — HYDRA — Attack the supply chain and trust chain
IDRIS — Discovery & Governance | AI Shield — Defence | redspecter-siem — SIEM Integration (Splunk, Sentinel, QRadar)
FORGE findings feed directly into AI Shield. Every finding generates a machine-ingestible blocking rule. One pipeline from testing to runtime protection. No gaps.
Report Output
Reports are available in JSON and HTML formats. Both are generated automatically by forge report build.
JSON Report Structure
The JSON report includes:
- report_id — unique report identifier
- target — the LLM that was tested
- overall_grade — A+ through F, weighted by severity
- overall_score — 0–100
- findings — array of normalised findings (see schema above)
- per_tool_summary — grade and score per tool
- owasp_coverage — which OWASP categories have findings
- ai_shield_policies — aggregated blocking rules
- signature — Ed25519 signature + RFC 3161 timestamp
HTML Report
Dark-themed HTML report with: executive summary, overall grade visualisation, per-tool breakdown, OWASP coverage matrix, sortable findings table, AI Shield policy export, and signature verification info.
Signature Verification
Key Features
Requirements
- Python 3.11+
- httpx — HTTP client with retry logic
- typer — CLI framework
- rich — terminal formatting and progress bars
- pydantic — data validation and config
- jinja2 — HTML report templating
- cryptography — Ed25519 signing
- scipy — KS tests, z-tests, t-tests
- numpy — numerical computation
Installation
Also available as .deb (Kali Linux, Parrot, REMnux, Tsurugi) and PKGBUILD (BlackArch).
Or from source:
Standards Coverage
Every finding FORGE produces is mapped to industry security frameworks:
- OWASP LLM Top 10 2025 — 10/10 categories covered (LLM01–LLM10)
The 10 categories:
- LLM01 — Prompt Injection
- LLM02 — Sensitive Information Disclosure
- LLM03 — Supply Chain
- LLM04 — Data and Model Poisoning
- LLM05 — Improper Output Handling
- LLM06 — Excessive Agency
- LLM07 — System Prompt Leakage
- LLM08 — Vector and Embedding Weaknesses
- LLM09 — Misinformation
- LLM10 — Unbounded Consumption
SIEM Export
FORGE exports findings directly to enterprise SIEM platforms with a single CLI flag. All findings are translated to the SIEM's native format with Ed25519 signatures and RFC 3161 timestamps preserved.
Supported Platforms
- Splunk — HTTP Event Collector (HEC), CIM-compliant field mapping
- Microsoft Sentinel — CEF format via Log Analytics API, HMAC-SHA256 authentication
- IBM QRadar — LEEF 2.0 format via Syslog (TCP/UDP/TLS)
Configuration
Configure SIEM credentials in ~/.redspecter/siem.yaml or via environment variables:
# ~/.redspecter/siem.yaml
splunk:
hec_url: https://splunk.example.com:8088
hec_token: your-hec-token
index: ai_security
verify_ssl: true
sentinel:
workspace_id: your-workspace-id
shared_key: your-shared-key
log_type: RedSpecterFindings
qradar:
syslog_host: qradar.example.com
syslog_port: 514
protocol: tcp
Usage
# Export to Splunk HEC
forge full-scan --target http://localhost:11434 --model llama3 --export-siem splunk
# Export to Microsoft Sentinel
forge full-scan --target http://localhost:11434 --model llama3 --export-siem sentinel
# Export to IBM QRadar
forge full-scan --target http://localhost:11434 --model llama3 --export-siem qradar
What Is Preserved
- Ed25519 cryptographic signatures on every finding
- RFC 3161 timestamps for tamper evidence
- SHA-256 evidence chain hashes
- OWASP LLM Top 10 mappings in SIEM-native fields
- AI Shield policy references per finding
Error Handling
If SIEM credentials are missing or the export fails, the scan completes normally and the report is saved locally. SIEM export never blocks a scan.
Packaging
FORGE is available in three package formats for security-focused Linux distributions:
- Debian / Kali / Parrot / REMnux / Tsurugi — .deb package
- BlackArch — PKGBUILD
- PyPI —
pip install red-specter-forge
For access, contact richard@red-specter.co.uk
FORGE UNLEASHED
Cryptographic override. Private key controlled. One operator. Founder's machine only.
Disclaimer
Red Specter FORGE is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any FORGE tool against a target. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse.