Red Specter ARSENAL — Offensive AI Security Framework

Prompt Injection / Memory Poisoning / Tool Abuse / Credential Theft / Data Exfiltration / Goal Drift / Supply Chain Compromise / RAG Poisoning / MCP Exploitation / Lateral Movement / Auth Bypass / Safety Decay Prompt Injection / Memory Poisoning / Tool Abuse / Credential Theft / Data Exfiltration / Goal Drift / Supply Chain Compromise / RAG Poisoning / MCP Exploitation / Lateral Movement / Auth Bypass / Safety Decay

The Problem

Nobody Tests AI Agents

Every AI security tool tests LLMs. Nobody tests AI agents. An LLM responds to prompts. An AI agent has memory, tools, credentials, and the ability to act autonomously. That is a completely different attack surface. ARSENAL tests it.

LLM Testing

Existing tools send prompts and check responses. They test the language model in isolation, ignoring everything around it.

Agent Testing

ARSENAL tests the full agent stack — memory systems, tool invocations, credential handling, RAG pipelines, MCP servers, and autonomous decision chains.

Attack Surface

Agents have persistent memory to poison, tools to hijack, credentials to steal, supply chains to compromise, and safety guardrails that decay over time.

The ARSENAL

14 Offensive Tools

Each tool targets a specific attack surface of autonomous AI agents. All findings include severity, confidence, evidence, remediation guidance, and are mapped to OWASP Agentic Top 10 and MITRE ATLAS.

#	Tool	Command	What It Does
01	Phantom Swarm	arsenal swarm scan	5 attack agents, 19 vectors — AI agent pen-testing
02	MCP Scanner	arsenal mcp scan	8 probes for MCP server security
03	Honeypot	arsenal honeypot deploy	6 AI agent personas, 4-level trap escalation
04	Inject Fuzzer	arsenal inject fuzz	6 generators, 5 mutators, 126+ payloads
05	C2 Simulator	arsenal c2 assess	5 implants, 4 covert channels
06	Memory Scanner	arsenal memory scan	6 probes for AI memory systems
07	Tool Scanner	arsenal tool scan	7 probes for tool-use vulnerabilities
08	Auth Scanner	arsenal auth scan	7 probes for AI authentication
09	RAG Scanner	arsenal rag scan	6 probes for RAG pipeline attacks
10	Supply Chain	arsenal supply scan	7 probes for AI supply chain security
11	Canary Deploy	arsenal canary deploy	5 asset types for tripwire detection
12	Drift Scanner	arsenal drift scan	6 probes for safety degradation over time
13	Path Mapper	arsenal path map	BloodHound-style attack graph analysis
14	Report Builder	arsenal report build	Unified reporting with Ed25519 signing

The Pipeline

Ten Tools. Every Layer. No Gaps.

ARSENAL is Stage 2 of the Red Specter offensive pipeline. Test the AI agent during development. Findings feed directly into AI Shield as runtime blocking rules and into redspecter-siem for enterprise SIEM correlation.

Stage 1 — LLM Testing

FORGE

Test the model before you build with it

→

Stage 2 — Agent Testing

ARSENAL

Test the AI agent during development

→

Stage 3 — Swarm Assault

PHANTOM

Coordinated AI agent swarm assault

→

Stage 4 — Web Siege

POLTERGEIST

Coordinated web application siege

→

Stage 5 — Traffic Interception

GLASS

Watch the wire

→

Stage 6 — Adversarial AI

NEMESIS

Think like the attacker

→

Stage 7 — Human Layer

SPECTER SOCIAL

Target the human

→

Stage 8 — OS/Kernel

PHANTOM KILL

Own the foundation

→

Stage 9 — Physical Layer

GOLEM

Attack the physical layer

→

Stage 10 — Supply Chain

HYDRA

Attack the trust chain

→

Discovery & Governance

IDRIS

Discover and govern AI assets

→

Defence

AI Shield

Defend everything above it

→

SIEM Integration

redspecter-siem

Findings feed directly into Splunk, Sentinel, QRadar

Full Kill Chain

arsenal full-assault

One command runs the complete kill chain. All 14 tools execute in sequence, findings feed into attack path mapping with compromise simulation, and the result is a signed evidence bundle with a board-ready report.

$ arsenal full-assault https://target-agent.com --token sk-xxx

FULL ASSAULT MODE — 14 TOOLS Phase 1/10: Swarm Engine — 5 agents, 19 vectors Phase 2/10: MCP Scanner — 8 probes Phase 3/10: Inject Fuzzer — 126+ payloads Phase 4/10: Memory Scanner — 6 probes Phase 5/10: Tool Scanner — 7 probes Phase 6/10: Auth Scanner — 7 probes Phase 7/10: RAG Scanner — 6 probes Phase 8/10: Supply Chain — 7 probes Phase 9/10: Canary Deploy — 5 asset types Phase 10/10: Drift Scanner — 6 probes Attack Path Mapping — 47 nodes, 89 edges, 12 chains Building Report — Ed25519 signed evidence bundle ════════════════════════════════════════════════════════════ FULL ASSAULT COMPLETE Findings: 183 Critical: 7 High: 24 Medium: 89 Low: 63 Grade: D- Report: reports/arsenal_full_report.json HTML: reports/arsenal_full_report.html Graph: reports/attack_graph.json ════════════════════════════════════════════════════════════

Ed25519-signed evidence bundles

SHA-256 tamper-evident chains

JSON + HTML board-ready reports

Attack path graphs with blast radius

Available On

Security Distros & Package Managers

Kali Linux

.deb package

Parrot OS

.deb package

BlackArch

PKGBUILD

REMnux

.deb package

Tsurugi

.deb package

PyPI

pip install

macOS

pip install

Windows

pip install

Docker

docker pull

Standards Coverage

Mapped to Industry Frameworks

Every finding ARSENAL produces includes severity, confidence score, evidence, remediation guidance, and references to the relevant framework categories.

Fully Mapped

OWASP Agentic Top 10

All 10 categories covered. Findings reference the specific OWASP agentic risk they address.

Excessive Agency
Prompt Injection
Insecure Output Handling
Supply Chain Vulnerabilities
Data Leakage

Fully Mapped

MITRE ATLAS

Technique-level mapping. Every finding references the ATLAS technique it demonstrates.

Initial Access techniques
ML Model Access
Exfiltration via ML Interface
Evade ML Model
ML Supply Chain Compromise

Aligned

Evidence Chain

All findings produce machine-readable evidence with SHA-256 integrity chains and Ed25519 digital signatures.

Tamper-evident hash chains
Ed25519 cryptographic signing
JSON evidence bundles
HTML board-ready reports
Attack graph visualisation

Ed25519 Cryptographic Override

ARSENAL UNLEASHED

Cryptographic override. Private key controlled. One operator. Founder's machine only.

Responsible Use

Authorised Testing Only

Warning

Red Specter ARSENAL is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any ARSENAL tool against a target. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse.

Pure Engineering

Zero External Tools. Zero Wrappers.

Most pen-testing frameworks are menus that shell out to sqlmap, nikto, and nmap behind a terminal UI. ARSENAL is actual engineering. Every payload, every mutation, every detection algorithm, every scoring engine — written from scratch in pure Python. Zero subprocess calls. Zero external tool dependencies.

784

Custom Payloads

Custom Tools

Subprocess Calls

External Dependencies

Enterprise Integration

Enterprise SIEM Integration — Native

Export every finding directly to your SIEM. One flag. Native format translation. Ed25519 signatures and RFC 3161 timestamps preserved across every export.

Splunk

HEC • CIM Compliant

Sentinel

CEF • Log Analytics API

QRadar

LEEF 2.0 • Syslog

arsenal full-assault http://target:8080 --export-siem splunk

Offensive AI SecurityFramework

Nobody Tests AI Agents

LLM Testing

Agent Testing

Attack Surface

14 Offensive Tools

Ten Tools. Every Layer. No Gaps.

arsenal full-assault

Security Distros & Package Managers

Mapped to Industry Frameworks

OWASP Agentic Top 10

MITRE ATLAS

Evidence Chain

Authorised Testing Only

Warning

Offensive AI Security
Framework