Pipelines Documentation
Pre-deployment simulation for AI agents
What is Pipelines?
Pipelines provides the infrastructure to test how AI agents behave in production-adjacent environments — from the decisions they make and the quality of their outputs to their operational performance, safety, and adherence to your standards. Port any agent, build a simulation, and scientifically measure what matters before you deploy.
Generate realistic scenarios on demand with our synthetic data generation service — seeded with your own data or built from scratch — tailored to your standard operating procedures, tribal knowledge, or curiosity. Then connect your agent, track its cost and performance, and ship a system you've tested and verified.
Quickstart
Agent Quickstart
Wire your agent to Odyssey, and validate its performance in 15 minutes.
Odyssey SDK
Wrap any Python agent and route its tools through the proxy with our SDK/CLI.
Test your system
Coding Agents
Run a coding CLI agent in a sandboxed workspace and read its trajectory, diff, and grading.
Multi-Turn/Multi-Agent Systems
Attribute each tool call to the acting sub-agent and drive multi-turn runs to grade collaboration.
Simulation
Simulate environments and test agent reasoning. MCP connections supported.
Framework Adapters
OpenAI, Anthropic, LangChain, Strands
How It Works
- Register your agent: Point us at an HTTP endpoint you host, or bring your own code to run in a sandbox — by pasting it, linking a git repo, or uploading an archive. Then declare the tools your agent can use.
- Generate scenarios: Use synthetic data generation to create test simulations in bulk, each with its own setup, expected behavior, and pass criteria.
- Run: Submit your scenarios to Odyssey, which runs each one as an isolated, live simulation — capturing every tool call and trace event along the way.
- Grade: Score each run with an LLM judge, your own custom graders, and operational metrics — getting a clear verdict with the reasoning behind it.
- Compare: Every experiment is stored and versioned, so you can review trajectories and track quality across agent versions.