RESEARCH

We ship what we research.
And teach what we ship.

The papers, the open source, and the playbooks behind everything Cohorte builds and teaches. Every claim we make about trust and verification traces to something on this page.

SCROLL
Why this exists

The work that grounds the practice.

Cohorte publishes its methods openly: black-box reliability certification for systems you cannot see inside, a 10,000-trial taxonomy of what makes agents exploit vulnerabilities, and the orchestration architecture behind the open-source stack. Plausible and correct are not the same thing.

Interactive

A reliability number for a black box.

TrustGate turns a model output into a single number: at what confidence can a practitioner trust this system on this task. Pick a model and a task. Slide the target confidence. Watch the conformal cutoff move.

No model call. Every number is published.

Papers

Open research.

Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration

Charafeddine Mouzouni · Under review, TMLR (double-blind) · 2026

Given a black-box AI system and a task, at what confidence can a practitioner trust its output? We answer with a single reliability level per (system, task) pair, derived from self-consistency sampling and conformal calibration, that acts as a deployment gate with exact, finite-sample, distribution-free guarantees.

Result: GPT-4.1 reaches 94.6% on GSM8K and 96.8% on TruthfulQA. Sequential stopping cuts API cost by about 50% without losing the guarantee.
StatusUnder review at TMLR
ImplementationTrustGate (open source)

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

Charafeddine Mouzouni · OPIT and Cohorte AI · arXiv preprint · 2026

LLM agents with tool access can discover and exploit vulnerabilities. Which features of a system prompt trigger it? We test 37 prompt conditions across 12 psychological dimensions on 7 models in real Docker sandboxes, about 10,000 trials. Nine of twelve hypothesised attack dimensions produce zero exploitation. One works: goal reframing (puzzle, CTF, easter egg).

Result: On Claude Sonnet 4, puzzle framing triggers 38-40% exploitation despite an explicit safety instruction.
StatusarXiv preprint, 2026
Code & dataPublic repository

Context Kubernetes: An Orchestration Architecture for Enterprise Knowledge in Agentic AI Systems

Charafeddine Mouzouni · arXiv preprint · 2026

Delivering the right knowledge, to the right agent, with the right permissions, at the right freshness, within the right cost envelope, across an organisation, is structurally the container-orchestration problem Kubernetes solved a decade ago. We introduce a declarative manifest, a reconciliation loop, and a three-tier permission model where agent authority is always a strict subset of human authority.

Result: Prototype ~7,000 lines, 92 tests, 8 experiments. Without governance, agents serve phantom content in 26.5% of queries. Governed routing eliminates it. The three-tier model blocks attacks RBAC does not.
StatusarXiv preprint, 2026
Reference implementationContext Kubernetes (open source)

Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training

Charafeddine Mouzouni · arXiv preprint · April 2026

We model MoE token routing as a congestion game and track its effective congestion across training. The trajectory reveals three phases: a surge where the router learns to balance load, a stabilisation where experts specialise, and a relaxation where the router trades balance for quality. This non-monotone trajectory is invisible to post-hoc analysis of converged models.

Result: Studied across OLMoE-1B-7B (20 checkpoints) and OpenMoE-8B (6 checkpoints), with bootstrap confidence intervals on every estimate.
StatusarXiv preprint, April 2026
Code & dataPublic repository
Open source

Reference implementations.

Playbooks

Between the papers and the practice.

The thread

The research is the foundation. The practice is what we install.

The research record

What grounds every claim

Papers, repositories, playbooks. Citable, auditable, reproducible. The work the rest of the company stands on.

What we build

Production systems

AI shipped into serious places, then transferred to the team that runs them. We teach what we ship.

What we teach

The method, made learnable

The bootcamp, the courses, the team programs. The same operating model, taught to the people who run the systems.

Put it to work

Bring the method to your team.

The papers are public, the code is open. The harder work is installing reliability and accountability inside a team that already exists. That is what we do.