Multi-agent evaluation
Register multiple agents and execute repeatable test suites against each target.
Multi-Agent Testing Environment — AI Agent Quality Testing Platform
mate connects to AI agents, runs automated evaluation suites against them, tracks quality over time, and red-teams them for adversarial vulnerabilities. It supports Microsoft Copilot Studio, Azure AI Foundry*, generic HTTP agents*, and Parloa* out of the box, with a modular architecture for custom connectors, judges, and red-team providers.
* Roadmap items
Register multiple agents and execute repeatable test suites against each target.
Combine deterministic rubrics, LLM scoring, CopilotStudioJudge Mode with rubrics and hybrid evaluation strategies.
Probe for jailbreaks, prompt injection, hallucination, privacy leak, and more.
Run locally with Docker Compose or deploy with Bicep to Azure Container Apps.
Created by Holger Imbery. Connect on GitHub or LinkedIn to learn more.
Open the newest release package and changelog directly on GitHub.