Offensive AI Security
Framework

The world's first AI agent security testing framework 14 tools. 2,532 tests. One pip install. Zero failures.
14
Tools
2,532
Tests Passing
784
Attack Payloads
0
Failures
Prompt Injection / Memory Poisoning / Tool Abuse / Credential Theft / Data Exfiltration / Goal Drift / Supply Chain Compromise / RAG Poisoning / MCP Exploitation / Lateral Movement / Auth Bypass / Safety Decay Prompt Injection / Memory Poisoning / Tool Abuse / Credential Theft / Data Exfiltration / Goal Drift / Supply Chain Compromise / RAG Poisoning / MCP Exploitation / Lateral Movement / Auth Bypass / Safety Decay

Nobody Tests AI Agents

Every AI security tool tests LLMs. Nobody tests AI agents. An LLM responds to prompts. An AI agent has memory, tools, credentials, and the ability to act autonomously. That is a completely different attack surface. ARSENAL tests it.

LLM Testing

Existing tools send prompts and check responses. They test the language model in isolation, ignoring everything around it.

Agent Testing

ARSENAL tests the full agent stack — memory systems, tool invocations, credential handling, RAG pipelines, MCP servers, and autonomous decision chains.

Attack Surface

Agents have persistent memory to poison, tools to hijack, credentials to steal, supply chains to compromise, and safety guardrails that decay over time.

14 Offensive Tools

Each tool targets a specific attack surface of autonomous AI agents. All findings include severity, confidence, evidence, remediation guidance, and are mapped to OWASP Agentic Top 10 and MITRE ATLAS.

# Tool Command What It Does
01 Phantom Swarm arsenal swarm scan 5 attack agents, 19 vectors — AI agent pen-testing
02 MCP Scanner arsenal mcp scan 8 probes for MCP server security
03 Honeypot arsenal honeypot deploy 6 AI agent personas, 4-level trap escalation
04 Inject Fuzzer arsenal inject fuzz 6 generators, 5 mutators, 126+ payloads
05 C2 Simulator arsenal c2 assess 5 implants, 4 covert channels
06 Memory Scanner arsenal memory scan 6 probes for AI memory systems
07 Tool Scanner arsenal tool scan 7 probes for tool-use vulnerabilities
08 Auth Scanner arsenal auth scan 7 probes for AI authentication
09 RAG Scanner arsenal rag scan 6 probes for RAG pipeline attacks
10 Supply Chain arsenal supply scan 7 probes for AI supply chain security
11 Canary Deploy arsenal canary deploy 5 asset types for tripwire detection
12 Drift Scanner arsenal drift scan 6 probes for safety degradation over time
13 Path Mapper arsenal path map BloodHound-style attack graph analysis
14 Report Builder arsenal report build Unified reporting with Ed25519 signing

Ten Tools. Every Layer. No Gaps.

ARSENAL is Stage 2 of the Red Specter offensive pipeline. Test the AI agent during development. Findings feed directly into AI Shield as runtime blocking rules and into redspecter-siem for enterprise SIEM correlation.

Stage 1 — LLM Testing
FORGE
Test the model before you build with it
Stage 2 — Agent Testing
ARSENAL
Test the AI agent during development
Stage 3 — Swarm Assault
PHANTOM
Coordinated AI agent swarm assault
Stage 4 — Web Siege
POLTERGEIST
Coordinated web application siege
Stage 5 — Traffic Interception
GLASS
Watch the wire
Stage 6 — Adversarial AI
NEMESIS
Think like the attacker
Stage 7 — Human Layer
SPECTER SOCIAL
Target the human
Stage 8 — OS/Kernel
PHANTOM KILL
Own the foundation
Stage 9 — Physical Layer
GOLEM
Attack the physical layer
Stage 10 — Supply Chain
HYDRA
Attack the trust chain
Discovery & Governance
IDRIS
Discover and govern AI assets
Defence
AI Shield
Defend everything above it
SIEM Integration
redspecter-siem
Findings feed directly into Splunk, Sentinel, QRadar

arsenal full-assault

One command runs the complete kill chain. All 14 tools execute in sequence, findings feed into attack path mapping with compromise simulation, and the result is a signed evidence bundle with a board-ready report.

$ arsenal full-assault https://target-agent.com --token sk-xxx
FULL ASSAULT MODE — 14 TOOLS Phase 1/10: Swarm Engine — 5 agents, 19 vectors Phase 2/10: MCP Scanner — 8 probes Phase 3/10: Inject Fuzzer — 126+ payloads Phase 4/10: Memory Scanner — 6 probes Phase 5/10: Tool Scanner — 7 probes Phase 6/10: Auth Scanner — 7 probes Phase 7/10: RAG Scanner — 6 probes Phase 8/10: Supply Chain — 7 probes Phase 9/10: Canary Deploy — 5 asset types Phase 10/10: Drift Scanner — 6 probes Attack Path Mapping — 47 nodes, 89 edges, 12 chains Building Report — Ed25519 signed evidence bundle ════════════════════════════════════════════════════════════ FULL ASSAULT COMPLETE Findings: 183 Critical: 7 High: 24 Medium: 89 Low: 63 Grade: D- Report: reports/arsenal_full_report.json HTML: reports/arsenal_full_report.html Graph: reports/attack_graph.json ════════════════════════════════════════════════════════════
Ed25519-signed evidence bundles
SHA-256 tamper-evident chains
JSON + HTML board-ready reports
Attack path graphs with blast radius
14
Tools
2,532
Tests Passing
784
Attack Payloads
0
Failures
Apache 2.0
License

Security Distros & Package Managers

Kali Linux
.deb package
Parrot OS
.deb package
BlackArch
PKGBUILD
REMnux
.deb package
Tsurugi
.deb package
PyPI
pip install
macOS
pip install
Windows
pip install
Docker
docker pull

Mapped to Industry Frameworks

Every finding ARSENAL produces includes severity, confidence score, evidence, remediation guidance, and references to the relevant framework categories.

Fully Mapped

OWASP Agentic Top 10

All 10 categories covered. Findings reference the specific OWASP agentic risk they address.

  • Excessive Agency
  • Prompt Injection
  • Insecure Output Handling
  • Supply Chain Vulnerabilities
  • Data Leakage
Fully Mapped

MITRE ATLAS

Technique-level mapping. Every finding references the ATLAS technique it demonstrates.

  • Initial Access techniques
  • ML Model Access
  • Exfiltration via ML Interface
  • Evade ML Model
  • ML Supply Chain Compromise
Aligned

Evidence Chain

All findings produce machine-readable evidence with SHA-256 integrity chains and Ed25519 digital signatures.

  • Tamper-evident hash chains
  • Ed25519 cryptographic signing
  • JSON evidence bundles
  • HTML board-ready reports
  • Attack graph visualisation
ARSENAL UNLEASHED

Cryptographic override. Private key controlled. One operator. Founder's machine only.

Authorised Testing Only

Warning

Red Specter ARSENAL is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any ARSENAL tool against a target. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse.

Pure Engineering
Zero External Tools. Zero Wrappers.

Most pen-testing frameworks are menus that shell out to sqlmap, nikto, and nmap behind a terminal UI. ARSENAL is actual engineering. Every payload, every mutation, every detection algorithm, every scoring engine — written from scratch in pure Python. Zero subprocess calls. Zero external tool dependencies.

784
Custom Payloads
14
Custom Tools
0
Subprocess Calls
0
External Dependencies
Enterprise Integration
Enterprise SIEM Integration — Native

Export every finding directly to your SIEM. One flag. Native format translation. Ed25519 signatures and RFC 3161 timestamps preserved across every export.

Splunk
HEC • CIM Compliant
Sentinel
CEF • Log Analytics API
QRadar
LEEF 2.0 • Syslog
arsenal full-assault http://target:8080 --export-siem splunk