Glossary
Definitions of common terms used across the Pipelines platform
Glossary (A-Z)
Agent: An agent is the system under test. Pipelines dispatches one task at a time to your registered agent and records the resulting run trajectory.
Agent Library: The Agent Library is the admin surface where agents are registered, versioned, and configured. See Agents.
Agent-mode field: An agent-mode field is form field configured in the Pipelines graphical workspace for agent dispatch.
Code agent: A code agent runs inside the platform sandbox using either a Python entrypoint or a CLI coding profile. These runs can include workspace diff and scenario scorer grading.
Contributor: A Contributor is a project-scoped role focused on task completion and human review work.
Criteria: Criteria are reusable evaluation definitions, including human, LLM, and programmatic checks, that can be attached as evaluators. See Evaluations.
Data Explorer: Data Explorer is the aggregate dataset for individual agent configurations. Attached are judge verdicts, latency and cost measurements, and trace links.
Data Vault: Data Vault is the unified dataset hub for all agent runs. See Datasets.
Dataset: A dataset is the collection of agent decisions, traces, pass rate, and financial metrics that can be used for analysis, comparison, and export.
Evaluator: An evaluator is a user or LLM-defined criterion attached to an agent's input or output that scores task completions.
Expected outcome: Expected outcome is an input that defines what "correct" looks like, such as task completion or action-refusal. Used for judgement which compares the agent's final output against expected outcome.
External HTTP agent: An external HTTP agent is a customer-hosted agent endpoint that Pipelines calls via HTTP dispatch.
Failure rules: Failure rules are seeded rules that deterministically inject failures so you can test recovery behavior.
Field session: A field session is a multi-step interaction for an agent. Conversation state and context progresses as the interaction grows in length.
Judge verdict: A judge verdict is LLM-judge output on a fixed rubric, including pass or fail, reasoning, and a failure mode when failed.
Ledger schema: A ledger schema is an optional typed schema for simulated world entities and state used by Odyssey.
MCP tools: MCP tools are tools sourced from MCP servers and callable by agents through declared tool schemas.
Multi-agent system: A multi-agent system is a topology (defined agent hierarchy) where one system contains multiple internal agents or sub-agents and the handoffs are traced.
Multi-turn testing: Multi-turn testing is session-based testing where a model-as-user interacts with the agent over multiple turns. This simulates conversations and tests agent memory.
Odyssey: Odyssey is the world simulation and runtime layer that mediates tool calls and applies simulation behavior.
Odyssey proxy URL: The Odyssey proxy URL is a per-run endpoint the agent uses for tool calls so Pipelines can observe, simulate, and score execution.
Organization: An organization is the top-level account boundary for projects, members, models, tools, and permissions.
Org Admin: An Org Admin is an organization-scoped admin role with full control over organization resources.
Passthrough mode: Passthrough mode is a tool execution mode that forwards calls to live external endpoints (e.g. Tavily, Zapier) instead of simulation.
Pipeline: A pipeline is the workflow scaffold that executes tasks, agents, evaluators, and review steps. See Pipelines.
Project: A project is a workspace inside an organization that scopes agents, datasets, tasks, and role assignments.
Project Admin: A Project Admin is a project-scoped admin role. Owners have management permissions while viewers are read-only.
Run: A run is one (agent, task) execution with trajectory, outputs, metrics, and verdicts.
Sandbox mode: Sandbox mode is a tool execution mode where Odyssey returns simulated responses from seeded state.
Scenario scorers: Scenario scorers are mechanical checks, especially for coding or code-agent scenarios, that are used alongside judge scoring.
Seed / task seed: A seed, or task seed, is the scenario input for a run. It includes instruction, behavior instructions, initial world state, failure rules, and expected outcome.
Studio: Studio is the dataset analytics and charting workspace for exploration and comparison.
Task: A task is the unit of work for one pipeline row. Agent execution and optional human review operate at this level.
Tools schema: A tools schema is the declared list of tools and JSON input schemas that an agent can call during a run.
Trajectory: A trajectory is the ordered record of tool calls, arguments, responses, and sources across a run or session.