imajin/docs/services/imajin-moderator.md

3.9 KiB

imajin-moderator Service

Multi-layer content moderation with 5 detection layers, deterministic decision logic, and timing side-channel prevention.

Overview

Property Value
Port 8008
Stack Python, FastAPI, PyTorch, transformers, InsightFace
Package @lilith/imajin-moderator-client (Python), @imajin/moderator-types (TypeScript)

Architecture

imajin-moderator/
├── service/
│   └── src/
│       ├── api/main.py                    # FastAPI routes (45+ endpoints)
│       ├── config/settings.py             # Port 8008, pipeline config
│       ├── detection/
│       │   ├── pipeline.py                # Multi-layer orchestration
│       │   ├── pdq_hasher.py              # Layer 1: Perceptual hashing (Meta PDQ)
│       │   ├── nsfw_detector.py           # Layer 2: NSFW classification
│       │   ├── age_estimator.py           # Layer 3: Age estimation
│       │   ├── prohibited_content_detector.py  # Layer 4: Zero-shot prohibited content
│       │   └── identity_verifier.py       # Layer 5: Face embedding verification
│       └── models/
│           ├── schemas.py                 # Request/response models
│           ├── decisions.py               # Decision logic & block reasons
│           └── prohibited_prompts.py      # 5 illegal content categories
├── types/                                 # TypeScript type definitions
└── client/                                # Python async HTTP client

Detection Layers

All 5 layers run unconditionally on every scan (constant-time execution prevents timing side-channels):

Layer Model / Method Detects Flags
1. PDQ Hash Meta PDQ (256-bit perceptual hash) Known-bad content via Redis hash database KNOWN_BAD_HASH
2. NSFW Marqo/nsfw-image-detection-384 Nudity, adult content (explicit ≥0.7, suggestive ≥0.4) NSFW_EXPLICIT, NSFW_SUGGESTIVE
3. Age InsightFace buffalo_l + nateraw/vit-age-classifier Potential minors (conservative threshold: 25 years) POTENTIAL_MINOR
4. Prohibited SigLIP2 zero-shot via imajin-semantic 5 illegal categories (bestiality, sexual violence, unconscious, necrophilia, trafficking) VIOLENCE_DETECTED
5. Identity Face embeddings via imajin-identity Identity mismatch (cosine similarity threshold: 0.68) Identity verification

Decision Logic

Deterministic priority order:

  1. BLOCKED — Known-bad hash OR (minor + NSFW) OR prohibited content above block threshold
  2. QUARANTINED — Potential minor OR age estimation failure OR identity mismatch OR prohibited above quarantine threshold
  3. APPROVED — All layers passed

API Endpoints

Endpoint Method Description
/scan POST Single image scan (hash + NSFW + age)
/scan/full POST Full 5-layer scan (all detection + identity)
/scan/fast POST NSFW-only scan
/scan/batch POST Batch scan (max 50 images)
/detect/hash POST Standalone hash generation
/detect/nsfw POST Standalone NSFW classification
/detect/age POST Standalone age estimation
/hash/check POST Check hash against known-bad database
/hash/load POST Load known-bad hashes (auth required)
/health GET Health check (GPU status, hash count)

Service Dependencies

  • imajin-semantic (port 8005) — Prohibited content detection via SigLIP2 zero-shot classification
  • imajin-identity (port 8009) — Face embedding extraction for identity verification
  • Redis — PDQ hash database persistence