T112 — NIGHTFALL Platform Moderation Exploitation Engine

SPECTER CENSOR

Turn AI Moderation Into A Weapon.

SPECTER CENSOR fingerprints, manipulates, and weaponises AI content moderation pipelines across Twitter/X, Facebook, Instagram, LinkedIn, and TikTok. PROBE maps classifier thresholds and confirms homoglyph and ZWC bypass windows via the OpenAI Moderation API (Perspective API legacy fallback). FORGE generates adversarial content — trigger mode inflates toxicity to force removal, shield mode deflates it to evade detection. EVOLVE breeds FORGE variants across generations using a genetic algorithm with OpenAI Moderation as the oracle. ACCOUNT-FARM generates realistic account personas with warmup schedules and interaction graphs. MASS-FLAG executes coordinated multi-account report campaigns with trust-weighted ordering and natural jitter. POLICY-KILL crafts DMCA, GDPR erasure, and DSA illegal-content notices. GHOST-WRITER induces organic spam signals to train the algorithm to suppress a target account. 253 tests. Zero failures.

253
Tests
8
Modules
5
Platforms
4
Gate Tiers

OpenAI Moderation Oracle Genetic Adversarial Optimisation CEN-{hex} Ed25519-Signed Reports

Attack Pipeline

01
PROBE
02
FORGE
03
EVOLVE
04
FARM
05
TRUST-BOOST
06
MASS-FLAG
07
POLICY-KILL
08
GHOST-WRITER

Modules

PROBE [OPEN]

Fingerprints platform classifier thresholds, bypass windows, and trust signals. Graduated probe corpus from CLEAN_BASELINE through HIGH_TOXICITY. Detects homoglyph bypass (Cyrillic lookalikes) and ZWC evasion. OpenAI Moderation API scorer with delta confirmation (Perspective API fallback). Maps classifier type and borderline range.

FORGE [OPEN]

Generates adversarial content variants. TRIGGER mode: injects aggression markers, homoglyphs, and toxicity triggers to inflate classifier score and force removal. SHIELD mode: applies academic framing, ZWC dilution, and leet substitution to deflate score and evade detection. Scores all variants via OpenAI Moderation API.

EVOLVE [OPEN]

Genetic algorithm optimiser. Seeds generation 0 from FORGE variants. Scores each individual via OpenAI Moderation API, keeps top survivors, breeds offspring via crossover and mutation (homoglyphs + leet + aggression markers), iterates for N generations. Returns fittest payload with full lineage. Independent --evolve-mode trigger/shield.

ACCOUNT-FARM [OPEN]

Generates realistic account personas for any platform. Realistic name/bio/location pools. Per-platform warmup content templates (≥5 per platform). Posting schedules matching normal human cadence. Interaction graphs between accounts. Trust signal accretion tracking. Full persona spec + warmup schedule output.

TRUST-BOOST [OPEN]

Builds detailed trust accretion plans to age accounts from zero to target trust score thresholds. Maps trust signals per platform (verification, follower graph, post history, engagement rate). Identifies gaps, estimates days to target, flags known threshold requirements for trusted-reporter status.

MASS-FLAG [UNLEASHED]

Coordinated multi-account report campaign. Trust-weighted account ordering (highest trust reports first). Natural timing jitter with Gaussian micro-variation. Proxy rotation. Per-platform auto-removal threshold tracking. Estimates when threshold is likely hit. Requires Ed25519 key + --confirm-mass-reporting.

POLICY-KILL [OPEN]

Legal takedown notice generation and submission. DMCA copyright infringement with full perjury statement. GDPR Art.17 erasure request with data subject rights framing. EU DSA illegal content report with harm category classification. Submission to platform abuse portals with response window estimates.

GHOST-WRITER [DESTROY]

Induces organic spam signals to train the platform recommendation algorithm to suppress a target account. Generates high-volume coordinated posts with spam signal patterns, cross-links, and engagement manipulation. Suppress mode targets enemy accounts. Protect mode shields friendly accounts. Requires ROE + Ed25519 + --confirm-moderation-destruction.

WMD Classes

coordinated_content_suppression algorithmic_suppression_induction legal_content_suppression classifier_manipulation_at_scale

Gate System

GateLevelModulesRequirement
OPEN0PROBE, FORGE, EVOLVE, ACCOUNT-FARM, TRUST-BOOST, POLICY-KILLNone
INJECT1
UNLEASHED2MASS-FLAGEd25519 key at ~/.specter/spectercensor_ed25519.pem + --confirm-mass-reporting
DESTROY3GHOST-WRITEREd25519 key + ROE file "moderation destruction authorised" + --confirm-moderation-destruction

Supported Platforms

PlatformPROBEFORGEMASS-FLAGPOLICY-KILL
Twitter / XDMCA / GDPR / DSA
FacebookDMCA / GDPR / DSA
InstagramDMCA / GDPR / DSA
LinkedInDMCA / GDPR / DSA
TikTokDMCA / DSA

Quick Start

# Install
pip install -e .

# Fingerprint platform classifier
specter-censor --platform twitter --probe --api-key <openai-api-key>

# Generate trigger content (inflate toxicity to force removal)
specter-censor --platform twitter --target <url> --forge trigger --api-key <key>

# Genetic optimisation in shield mode
specter-censor --platform twitter --text "seed text" --evolve --evolve-mode shield --api-key <key> --generations 10

# Generate 20 account personas for LinkedIn
specter-censor --platform linkedin --farm --count 20 --warmup-days 30

# Coordinated report campaign (UNLEASHED)
specter-censor --platform twitter --target <url> --mass-flag --tokens-file tokens.json --confirm-mass-reporting

# DMCA takedown notice
specter-censor --platform facebook --target <url> --policy-kill dmca \
  --complainant-name "John Smith" --complainant-email contact@example.com --submit

# Algorithmic suppression (DESTROY)
specter-censor --platform twitter --target <url> --ghost-write suppress \
  --api-key <key> --confirm-moderation-destruction