Pipelines Documentation

What is Pipelines?

Pipelines provides the infrastructure to test how AI agents behave in production-adjacent environments — from the decisions they make and the quality of their outputs to their operational performance, safety, and adherence to your standards. Port any agent, build a simulation, and scientifically measure what matters before you deploy.

Generate realistic scenarios on demand with our synthetic data generation service — seeded with your own data or built from scratch — tailored to your standard operating procedures, tribal knowledge, or curiosity. Then connect your agent, track its cost and performance, and ship a system you've tested and verified.

Register your agent: Point us at an HTTP endpoint you host, or bring your own code to run in a sandbox — by pasting it, linking a git repo, or uploading an archive. Then declare the tools your agent can use.
Generate scenarios: Use synthetic data generation to create test simulations in bulk, each with its own setup, expected behavior, and pass criteria.
Run: Submit your scenarios to Odyssey, which runs each one as an isolated, live simulation — capturing every tool call and trace event along the way.
Grade: Score each run with an LLM judge, your own custom graders, and operational metrics — getting a clear verdict with the reasoning behind it.
Compare: Every experiment is stored and versioned, so you can review trajectories and track quality across agent versions.

What is Pipelines?

Quickstart

Agent Quickstart

Odyssey SDK

Test your system

Coding Agents

Multi-Turn/Multi-Agent Systems

Simulation

Framework Adapters

How It Works

On this page