The exploitation surface of AI agents, mapped, measured, and defended. Based on CM's published research: 10,000 trials, 37 conditions, 12 attack dimensions. 9 fully defensible. 1 that isn't. You need to know which.
Every AI agent you deploy has an exploitation surface. A set of dimensions along which an adversary (or an accident) can cause it to behave in ways you didn't intend. Most teams don't think about this. They deploy agents, test the happy path, and hope for the best.
Hope is not a security posture.
9 of 12 attack dimensions produced zero exploitation with proper defenses. That is the reassuring part. But goal reframing, getting the agent to pursue a different objective than intended, succeeded 32-40% of the time across multiple configurations. That is the terrifying part. And almost nobody is testing for it.
Every defense in this course has been tested against these exploitation trials. Not theoretical. Measured and verified across 37 experimental conditions and multiple model configurations.
Each module covers a phase of the security engineering process: understand the threat model, reproduce the attacks, build the defenses, red-team your own system.
Why traditional AppSec doesn't protect AI systems. The new threat model: probabilistic attacks on probabilistic systems. OWASP Top 10 for LLM Applications. MITRE ATLAS attack taxonomy. The Three V's revisited through a security lens.
Direct prompt injection (overriding system instructions). Indirect prompt injection (Greshake et al.: attacks hidden in retrieved data). Goal reframing as the under-explored attack class: puzzle framing at 32-40%, CTF framing at 32-34%, easter-egg concealment behavior. Model-specific immunity patterns: why GPT-x.1 is categorically immune.
Lab: Reproduce 3 exploitation scenarios in a Docker sandboxguardrails architecture and API. Input guardrails: scanning for injection patterns. Output guardrails: PII detection, policy compliance, content filtering. The policy engine: rules enforced programmatically, not hoped-for in the prompt. Combining guardrails with the E3 governance gate.
agent-auth: least privilege for AI. Just-in-time access (temporary permissions that expire). Infrastructure isolation: Docker, gVisor, Firecracker for AI workloads. Network egress allowlists. Blast-radius containment: if one agent is compromised, the damage is bounded.
Red-teaming methodology for AI systems. Building a red-team protocol: scope, scenarios, scoring, reporting. Take the E3 capstone, red-team it against all 12 dimensions, fix what fails, re-test. The result: a security certification report with before/after exploitation metrics.
In E4 your E3 capstone gets security-hardened. The same project carries forward through E5 and E6 into a deployed Enterprise AI Operating System.
E1 and E3 completed (E2 recommended but not required), or equivalent experience building accountable AI agents with governance middleware. Comfortable with Python, Docker, and basic security concepts.
Python 3.12, Docker and gVisor for sandboxing, open-source guardrails and agent-auth repos (Apache 2.0). The arXiv paper (2604.04561) annotated and walked through scenario-by-scenario.
Apache 2.0)Want all six courses?
See the Engineering Series bundle →10,000 trials. 12 attack dimensions. 37 conditions. The map your security posture needs. €197. Lifetime access.
Get on the waitlist