Red Specter FORGE

Automated LLM Security Testing — 10 tools to test the model before you build an agent around it.

v1.0.0

Contents

Overview The 10 Tools Tool Details Full Scan Mode Mutation Engine Payload Library The Pipeline Report Output Key Features Requirements Standards Coverage SIEM Export Packaging Disclaimer

Overview

Red Specter FORGE is an automated LLM security testing framework. Every existing tool — Garak, PyRIT, Promptfoo — runs limited probe sets and reports pass/fail. FORGE runs full attack campaigns with adaptive escalation, mutation engines, statistical rigour, and direct integration into AI Shield runtime protection. It doesn't ask nicely. It finds what breaks.

FORGE provides 10 tools under a single CLI (forge), 1,590 static payloads (5,340+ with mutations), and Ed25519-signed reports with OWASP LLM Top 10 2025 mapping on every finding.

FORGE is Stage 1 of the Red Specter offensive pipeline — 10 tools covering every layer. Test the model (FORGE), test the agent (ARSENAL), assault the swarm (PHANTOM), siege the web app (POLTERGEIST), intercept traffic (GLASS), think like the attacker (NEMESIS), target the human (SPECTER SOCIAL), own the foundation (PHANTOM KILL), attack the physical layer (GOLEM), attack the trust chain (HYDRA). IDRIS discovers and governs. AI Shield defends. redspecter-siem correlates. FORGE findings feed directly into AI Shield as runtime blocking rules.

The 10 Tools

#	Tool	Command	What It Does
01	Inject Scan	forge inject scan	80 payloads across 8 injection classes with mutation engine
02	Jailbreak Scan	forge jailbreak scan	70 payloads across 7 jailbreak categories with adaptive mutation
03	Output Scan	forge output scan	140 payloads — PII extraction, unsafe content, exfiltration simulation
04	Policy Scan	forge policy scan	1,000 adversarial prompts with Wilson score confidence intervals
05	Drift Scan	forge drift scan	Multi-turn drift measurement with KS tests and change-point detection
06	Boundary Scan	forge boundary scan	100 payloads across 5 severity levels with adaptive binary search
07	Compare Scan	forge compare scan	Identical campaigns against multiple models with chi-square testing
08	Regression Scan	forge regression scan	Two-proportion z-test and paired t-test across model versions
09	Supply Scan	forge supply scan	200 behavioural probes for model fingerprinting and tamper detection
10	Report Build	forge report build	Unified signed reports with OWASP mapping and AI Shield policy generation

Tool Details

01 Inject Scan forge inject scan

Fires every known prompt injection class against the target model. Not a checklist — an attack campaign. 80 base payloads expanded to 2,000+ via the mutation engine.

Direct Injection — 12 payloads (instruction override, system prompt extraction, constraint removal)
Indirect Injection — 10 payloads (document, HTML, JSON, CSV, API response poisoning)
Token Smuggling — 10 payloads (Base64, Unicode homoglyphs, zero-width chars, ROT13)
Context Overflow — 10 payloads (attention window exploitation, priority inversion)
Goal Hijacking — 10 payloads (task substitution, objective replacement)
Multi-Turn Deception — 10 payloads (progressive escalation, trust exploitation)
Rule Inversion — 10 payloads (negation attacks, double negatives, exception mining)
Multimodal Injection — 8 payloads (metadata, alt-text, cross-modal confusion)

Adaptive escalation: if the model resists initial payloads, FORGE applies mutations and escalates to more aggressive variants automatically.

02 Jailbreak Scan forge jailbreak scan

Systematic jailbreak assault. 70 base payloads across 7 documented jailbreak categories. Mutates payloads based on model resistance. Keeps pushing until the model breaks or exhausts the full payload library.

DAN Variants — 12 payloads (DAN 6.0, 11.0, STAN, DUDE, Maximum, AIM, KEVIN, BetterDAN)
Persona Hijack — 10 payloads (villain, hacker, unrestricted AI, fictional universe framing)
Hypothetical Framing — 10 payloads (educational, creative, research, red team exercise)
Obfuscation — 10 payloads (Base64, ROT13, leetspeak, Unicode, Pig Latin, acrostic)
Multi-Step Chaining — 10 payloads (gradual escalation, trust building, permission chaining)
Socratic Extraction — 10 payloads (question ladders, component extraction, analogy exploitation)
Temporal Drift — 8 payloads (historical era framing, alternate timeline, pre-regulation context)

03 Output Scan forge output scan

Forces the model to leak. 140 payloads designed to surface what the model will produce under adversarial pressure.

PII Extraction — 60 payloads (SSN patterns, credit cards, phone numbers, training data probing, memorisation attacks)
Unsafe Content — 60 payloads (hate speech induction, violence, self-harm, explicit, misinformation, dangerous instructions)
Exfiltration Simulation — 20 payloads (markdown image injection, hidden iframes, webhook URLs, fetch calls, DNS exfil)

Response analysis: 16 regex patterns for PII detection (emails, SSNs, phones, credit cards, API keys, NINOs, IBANs), keyword-based toxicity scoring, and 16 code exfiltration pattern detectors.

04 Policy Scan forge policy scan

Runs 1,000+ calls against a defined policy set. Computes violation rates with Wilson score confidence intervals. Stratified by prompt category, toxicity level, and severity. Finds the exact conditions under which policy breaks down.

Content Safety — 200 prompts (violence, hate, self-harm, sexual, dangerous activities)
Information Security — 200 prompts (credentials, PII, system info, API keys, internal docs)
Behavioural Compliance — 200 prompts (refusal consistency, boundary maintenance)
Output Safety — 200 prompts (code safety, format compliance, factual accuracy)
Ethical Boundaries — 200 prompts (bias, discrimination, manipulation, deception)

Each prompt tagged with toxicity level (1–5), expected outcome (refuse/comply), and severity. Results reported with 95% Wilson score CIs per category.

05 Drift Scan forge drift scan

Long-session attack. Chains 50–200 turns with context retention. Measures when the model stops being the model it started as.

Cosine Similarity Drift — character-trigram vector comparison against baseline
Toxicity Drift — keyword-based scoring tracked across conversation windows
Length Drift — response length distribution changes
KS Test — Kolmogorov-Smirnov test for distribution changes (p < 0.05 = significant drift)
Change-Point Detection — rolling z-score for identifying where behaviour shifts

10 conversation sequences designed to gradually push boundaries. Segmented into windows (first 25%, middle 50%, last 25%) for targeted comparison.

06 Boundary Scan forge boundary scan

Maps the exact threshold where the model starts generating harmful content. Five-level severity ladder from benign to maximally harmful. Continuous boundary scoring 0–100. Produces a boundary curve. Finds the cliff edge — then pushes past it.

Level 1 — Benign — 20 payloads (weather, maths, recipes — baseline compliance rate)
Level 2 — Mild — 20 payloads (fiction violence, dark humour, security concepts)
Level 3 — Moderate — 20 payloads (vulnerability details, social engineering, pen-testing)
Level 4 — Severe — 20 payloads (weapons, drugs, fraud, identity theft)
Level 5 — Maximum — 20 payloads (mass harm, CBRN, CSAM, terrorism)

Adaptive binary search between severity levels to pinpoint the exact transition point with statistical backing.

07 Compare Scan forge compare scan

Runs identical attack campaigns against multiple models simultaneously. Temperature locked to 0. Same system prompt. Same payload library. Statistical significance enforced.

Standardised 30-payload campaign drawn from boundary Levels 3–5
Chi-square significance testing between models
Comparative security posture table
Weakest model identification with statistical backing

08 Regression Scan forge regression scan

Takes two model versions. Runs the critical test set against both. Tells you if the new version is weaker than the old one — and by exactly how much.

Two-proportion z-test on violation rates (baseline vs candidate)
Paired t-test on continuous scores
Cohen's h effect sizes for practical significance
60-payload critical test set across all severity levels
Per-level regression detection

09 Supply Scan forge supply scan

Fingerprints the target model using 200 behavioural probe prompts. Compares output patterns against known model signatures. Flags if the model is not what it claims to be. Reports confidence level honestly — this is probabilistic, not definitive.

Identity Probes — 50 probes (self-identification, creator, cutoff, capabilities)
Reasoning Probes — 50 probes (maths, logic, code style, error patterns)
Bias Probes — 50 probes (cultural perspective, political leaning, formality)
Robustness Probes — 50 probes (semantic consistency, paraphrase sensitivity, edge cases)

Pattern matching against 6 known model families (GPT, Claude, Llama, Gemini, Mistral, Command). Weighted category scoring with anomaly detection.

10 Report Build forge report build

Aggregates all tool outputs into a unified, signed report. Every finding mapped to OWASP LLM Top 10 2025. Every finding generates an AI Shield blocking rule. Ed25519 signed. RFC 3161 timestamped.

Aggregator: Loader, Normalizer, Deduplicator, Scorer
Formatters: JSON evidence bundle, HTML dark-themed report
Coverage: OWASP LLM Top 10 2025 mapping (LLM01–LLM10)
Signing: Ed25519 digital signatures with SHA-256 evidence chains
AI Shield: Machine-ingestible policy file — one blocking rule per finding
Grading: A+ through F, weighted by severity (CRITICAL=10, HIGH=7, MEDIUM=4, LOW=2, INFO=0.5)

Finding Schema

Every finding in the report includes:

finding_id — unique identifier
test_name — the specific test that triggered it
owasp_category — mapped to OWASP LLM Top 10 2025
severity — CRITICAL / HIGH / MEDIUM / LOW / INFO
score — 0–100 (higher is safer)
grade — A through F
payload_used — exact attack payload
model_response — exact model response
description — what was found
remediation — how to fix it
ai_shield_policy — the blocking rule for AI Shield

Full Scan Mode

One command runs all offensive tools in sequence, then builds a unified signed report.

$ forge full-scan --target https://api.openai.com --api-key sk-xxx --model gpt-4
    

What Happens

Inject Scan — 80+ payloads across 8 injection classes
Jailbreak Scan — 70+ payloads across 7 jailbreak categories
Output Scan — 140 payloads (PII, unsafe, exfiltration)
Policy Scan — 1,000 adversarial calls with Wilson CIs
Drift Scan — 10 conversation sequences with KS tests
Boundary Scan — 100 payloads across 5 severity levels
Report Build — aggregation, deduplication, OWASP mapping, signing

CLI Options

$ forge full-scan --help

  --target, -t          Target LLM endpoint URL [required]
  --model, -m           Model name [optional]
  --api-key, -k         API key [optional]
  --endpoint, -e        API endpoint path [default: /v1/chat/completions]
  --output, -o          Output directory [default: reports]
  --sign / --no-sign    Ed25519 signing [default: sign]
  --keys-dir            Keys directory [optional]
  --concurrency, -c     Max concurrent requests [default: 5]
  --delay, -d           Delay between requests [default: 0.0]
  --system-prompt, -s   System prompt to test against [optional]
  --verbose, -v         Verbose output
  --export-siem         Export to SIEM: splunk, sentinel, qradar [optional]
  --override            Activate UNLEASHED mode (dry-run) [requires Ed25519 key]
  --confirm-destroy     Go live — execute real destructive actions [requires --override]
    

Mutation Engine

Every offensive tool ships with a 5-category mutation engine. 25 mutation variants per payload. Applied to 150 base attack payloads, producing 3,750+ mutation variants. If the base payload fails, FORGE mutates it and tries again.

Mutator	Techniques
Encoding	Base64, hex encoding, ROT13, URL encoding, HTML entities
Obfuscation	L33tspeak, Unicode homoglyphs, zero-width character insertion, character doubling, whitespace injection
Semantic	Synonym substitution, passive voice rewriting, question-to-statement, negation inversion, academic framing
Structural	Markdown wrapping, code block wrapping, JSON embedding, XML wrapping, list formatting
Evasion	Language mixing, character splitting across lines, reverse text, Pig Latin, payload fragmentation

Adaptive escalation: when a tool encounters resistance, it automatically applies mutations to failed payloads before re-sending. The model doesn't get to see the same payload twice.

Payload Library

Tool	Category	Count
Inject Scan	8 injection classes (direct, indirect, token, overflow, hijack, multi-turn, inversion, multimodal)	80
Jailbreak Scan	7 jailbreak categories (DAN, persona, hypothetical, obfuscation, chaining, Socratic, temporal)	70
Output Scan	PII extraction (60), unsafe content (60), exfiltration simulation (20)	140
Policy Scan	5 categories × 200 prompts (content, infosec, behavioural, output, ethical)	1,000
Boundary Scan	5 severity levels × 20 payloads (benign → maximum)	100
Supply Scan	4 probe categories × 50 probes (identity, reasoning, bias, robustness)	200
Total Static Payloads		1,590
Mutation variants (25 per attack payload)		3,750+
Grand Total		5,340+

The Pipeline

FORGE is Stage 1 of the Red Specter offensive pipeline — 10 tools, every layer, the supply chain included:

Stage 1 — FORGE — Test the LLM before you build with it
Stage 2 — ARSENAL — Test the AI agent during development
Stage 3 — PHANTOM — Coordinated AI agent swarm assault
Stage 4 — POLTERGEIST — Coordinated web application siege
Stage 5 — GLASS — Traffic interception — watch the wire
Stage 6 — NEMESIS — Adversarial AI — think like the attacker
Stage 7 — SPECTER SOCIAL — Target the human layer
Stage 8 — PHANTOM KILL — Own the OS/kernel foundation
Stage 9 — GOLEM — Attack the physical layer
Stage 10 — HYDRA — Attack the supply chain and trust chain

IDRIS — Discovery & Governance | AI Shield — Defence | redspecter-siem — SIEM Integration (Splunk, Sentinel, QRadar)

FORGE findings feed directly into AI Shield. Every finding generates a machine-ingestible blocking rule. One pipeline from testing to runtime protection. No gaps.

Report Output

Reports are available in JSON and HTML formats. Both are generated automatically by forge report build.

JSON Report Structure

The JSON report includes:

report_id — unique report identifier
target — the LLM that was tested
overall_grade — A+ through F, weighted by severity
overall_score — 0–100
findings — array of normalised findings (see schema above)
per_tool_summary — grade and score per tool
owasp_coverage — which OWASP categories have findings
ai_shield_policies — aggregated blocking rules
signature — Ed25519 signature + RFC 3161 timestamp

HTML Report

Dark-themed HTML report with: executive summary, overall grade visualisation, per-tool breakdown, OWASP coverage matrix, sortable findings table, AI Shield policy export, and signature verification info.

Signature Verification

$ forge report verify --report reports/forge-full-scan.json --keys-dir .forge-keys/
    

Key Features

1,590 Static Payloads 5,340+ with 25-variant mutation engine

Adaptive Escalation Mutations and re-sends on model resistance

Ed25519 Signed Reports SHA-256 evidence chains, RFC 3161 timestamps

AI Shield Integration One blocking rule per finding, machine-ingestible

Statistical Rigour Wilson CIs, KS tests, z-tests, t-tests, Cohen's h

9,298 Tests Passing Full test suite, zero failures

Requirements

Python 3.11+
httpx — HTTP client with retry logic
typer — CLI framework
rich — terminal formatting and progress bars
pydantic — data validation and config
jinja2 — HTML report templating
cryptography — Ed25519 signing
scipy — KS tests, z-tests, t-tests
numpy — numerical computation

Installation

$ pip install red-specter-forge

Also available as .deb (Kali Linux, Parrot, REMnux, Tsurugi) and PKGBUILD (BlackArch).

Or from source:

$ git clone <repo>
$ cd red-specter-forge
$ pip install -e ".[dev]"
    

Standards Coverage

Every finding FORGE produces is mapped to industry security frameworks:

OWASP LLM Top 10 2025 — 10/10 categories covered (LLM01–LLM10)

The 10 categories:

LLM01 — Prompt Injection
LLM02 — Sensitive Information Disclosure
LLM03 — Supply Chain
LLM04 — Data and Model Poisoning
LLM05 — Improper Output Handling
LLM06 — Excessive Agency
LLM07 — System Prompt Leakage
LLM08 — Vector and Embedding Weaknesses
LLM09 — Misinformation
LLM10 — Unbounded Consumption

SIEM Export

FORGE exports findings directly to enterprise SIEM platforms with a single CLI flag. All findings are translated to the SIEM's native format with Ed25519 signatures and RFC 3161 timestamps preserved.

Supported Platforms

Splunk — HTTP Event Collector (HEC), CIM-compliant field mapping
Microsoft Sentinel — CEF format via Log Analytics API, HMAC-SHA256 authentication
IBM QRadar — LEEF 2.0 format via Syslog (TCP/UDP/TLS)

Configuration

Configure SIEM credentials in ~/.redspecter/siem.yaml or via environment variables:

# ~/.redspecter/siem.yaml
splunk:
  hec_url: https://splunk.example.com:8088
  hec_token: your-hec-token
  index: ai_security
  verify_ssl: true

sentinel:
  workspace_id: your-workspace-id
  shared_key: your-shared-key
  log_type: RedSpecterFindings

qradar:
  syslog_host: qradar.example.com
  syslog_port: 514
  protocol: tcp

Usage

# Export to Splunk HEC
forge full-scan --target http://localhost:11434 --model llama3 --export-siem splunk

# Export to Microsoft Sentinel
forge full-scan --target http://localhost:11434 --model llama3 --export-siem sentinel

# Export to IBM QRadar
forge full-scan --target http://localhost:11434 --model llama3 --export-siem qradar

What Is Preserved

Ed25519 cryptographic signatures on every finding
RFC 3161 timestamps for tamper evidence
SHA-256 evidence chain hashes
OWASP LLM Top 10 mappings in SIEM-native fields
AI Shield policy references per finding

Error Handling

If SIEM credentials are missing or the export fails, the scan completes normally and the report is saved locally. SIEM export never blocks a scan.

Packaging

FORGE is available in three package formats for security-focused Linux distributions:

Debian / Kali / Parrot / REMnux / Tsurugi — .deb package
BlackArch — PKGBUILD
PyPI — pip install red-specter-forge

For access, contact richard@red-specter.co.uk

FORGE UNLEASHED

Cryptographic override. Private key controlled. One operator. Founder's machine only.

Disclaimer

Red Specter FORGE is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any FORGE tool against a target. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse.