T112 — NIGHTFALL Platform Moderation Exploitation Engine

SPECTER CENSOR

Turn AI Moderation Into A Weapon.

SPECTER CENSOR fingerprints, manipulates, and weaponises AI content moderation pipelines across Twitter/X, Facebook, Instagram, LinkedIn, and TikTok. PROBE maps classifier thresholds and confirms homoglyph and ZWC bypass windows via the OpenAI Moderation API (Perspective API legacy fallback). FORGE generates adversarial content — trigger mode inflates toxicity to force removal, shield mode deflates it to evade detection. EVOLVE breeds FORGE variants across generations using a genetic algorithm with OpenAI Moderation as the oracle. ACCOUNT-FARM generates realistic account personas with warmup schedules and interaction graphs. MASS-FLAG executes coordinated multi-account report campaigns with trust-weighted ordering and natural jitter. POLICY-KILL crafts DMCA, GDPR erasure, and DSA illegal-content notices. GHOST-WRITER induces organic spam signals to train the algorithm to suppress a target account. 253 tests. Zero failures.

253

Tests

Modules

Platforms

Gate Tiers

OpenAI Moderation Oracle Genetic Adversarial Optimisation CEN-{hex} Ed25519-Signed Reports

Read The Docs → All NIGHTFALL Tools

Modules

PROBE [OPEN]

Fingerprints platform classifier thresholds, bypass windows, and trust signals. Graduated probe corpus from CLEAN_BASELINE through HIGH_TOXICITY. Detects homoglyph bypass (Cyrillic lookalikes) and ZWC evasion. OpenAI Moderation API scorer with delta confirmation (Perspective API fallback). Maps classifier type and borderline range.

FORGE [OPEN]

Generates adversarial content variants. TRIGGER mode: injects aggression markers, homoglyphs, and toxicity triggers to inflate classifier score and force removal. SHIELD mode: applies academic framing, ZWC dilution, and leet substitution to deflate score and evade detection. Scores all variants via OpenAI Moderation API.

EVOLVE [OPEN]

Genetic algorithm optimiser. Seeds generation 0 from FORGE variants. Scores each individual via OpenAI Moderation API, keeps top survivors, breeds offspring via crossover and mutation (homoglyphs + leet + aggression markers), iterates for N generations. Returns fittest payload with full lineage. Independent --evolve-mode trigger/shield.

ACCOUNT-FARM [OPEN]

Generates realistic account personas for any platform. Realistic name/bio/location pools. Per-platform warmup content templates (≥5 per platform). Posting schedules matching normal human cadence. Interaction graphs between accounts. Trust signal accretion tracking. Full persona spec + warmup schedule output.

TRUST-BOOST [OPEN]

Builds detailed trust accretion plans to age accounts from zero to target trust score thresholds. Maps trust signals per platform (verification, follower graph, post history, engagement rate). Identifies gaps, estimates days to target, flags known threshold requirements for trusted-reporter status.

MASS-FLAG [UNLEASHED]

Coordinated multi-account report campaign. Trust-weighted account ordering (highest trust reports first). Natural timing jitter with Gaussian micro-variation. Proxy rotation. Per-platform auto-removal threshold tracking. Estimates when threshold is likely hit. Requires Ed25519 key + --confirm-mass-reporting.

POLICY-KILL [OPEN]

Legal takedown notice generation and submission. DMCA copyright infringement with full perjury statement. GDPR Art.17 erasure request with data subject rights framing. EU DSA illegal content report with harm category classification. Submission to platform abuse portals with response window estimates.

GHOST-WRITER [DESTROY]

Induces organic spam signals to train the platform recommendation algorithm to suppress a target account. Generates high-volume coordinated posts with spam signal patterns, cross-links, and engagement manipulation. Suppress mode targets enemy accounts. Protect mode shields friendly accounts. Requires ROE + Ed25519 + --confirm-moderation-destruction.

WMD Classes

coordinated_content_suppression algorithmic_suppression_induction legal_content_suppression classifier_manipulation_at_scale

Gate System

Gate	Level	Modules	Requirement
OPEN	0	PROBE, FORGE, EVOLVE, ACCOUNT-FARM, TRUST-BOOST, POLICY-KILL	None
INJECT	1	—	—
UNLEASHED	2	MASS-FLAG	Ed25519 key at ~/.specter/spectercensor_ed25519.pem + --confirm-mass-reporting
DESTROY	3	GHOST-WRITER	Ed25519 key + ROE file "moderation destruction authorised" + --confirm-moderation-destruction

Supported Platforms

Platform	PROBE	FORGE	MASS-FLAG	POLICY-KILL
Twitter / X	✓	✓	✓	DMCA / GDPR / DSA
Facebook	✓	✓	✓	DMCA / GDPR / DSA
Instagram	✓	✓	✓	DMCA / GDPR / DSA
LinkedIn	✓	✓	✓	DMCA / GDPR / DSA
TikTok	✓	✓	✓	DMCA / DSA

Quick Start

# Install
pip install -e .

# Fingerprint platform classifier
specter-censor --platform twitter --probe --api-key <openai-api-key>

# Generate trigger content (inflate toxicity to force removal)
specter-censor --platform twitter --target <url> --forge trigger --api-key <key>

# Genetic optimisation in shield mode
specter-censor --platform twitter --text "seed text" --evolve --evolve-mode shield --api-key <key> --generations 10

# Generate 20 account personas for LinkedIn
specter-censor --platform linkedin --farm --count 20 --warmup-days 30

# Coordinated report campaign (UNLEASHED)
specter-censor --platform twitter --target <url> --mass-flag --tokens-file tokens.json --confirm-mass-reporting

# DMCA takedown notice
specter-censor --platform facebook --target <url> --policy-kill dmca \
  --complainant-name "John Smith" --complainant-email contact@example.com --submit

# Algorithmic suppression (DESTROY)
specter-censor --platform twitter --target <url> --ghost-write suppress \
  --api-key <key> --confirm-moderation-destruction

SPECTER CENSOR

Attack Pipeline

Modules

WMD Classes

Gate System

Supported Platforms

Quick Start