Agent Quickstart
Workflow for Evaluating Agents with Pipelines
This quickstart is organized into three steps:
- Connect: wire local agent to Pipelines, including tunnel and auth.
- Build: configure the platform workflow and seed inputs.
- Test: run experiments, inspect trajectories, and track outcomes.
Prerequisites
- Python 3.10+ with
pip. - An
OPENAI_API_KEY(or swap to Anthropic, Strands, or scratch later; see reference templates). - A tunnel binary on your
PATH. Usecloudflared(recommended) orngrok. - A Pipelines account with Project Admin (owner) or Org Admin
permissions (Agents are an admin feature). Mint a
pk_live_...API key from Settings -> API Keys.
1) Connect
Connection handles all local-to-platform plumbing: dispatch auth, per-run proxy routing, and the public tunnel needed by registration reachability gates.
1.1 Wrap the agent with the SDK pattern
Below is an example of the SDK's canonical shape from Agent SDK: proxied tools, a build_agent() factory, and a dispatch handler mounted by register_dispatch_route.
Save as app.py:
from agents import Agent, Runner, function_tool
from fastapi import FastAPI
from pipelines.odyssey import proxy_call, register_dispatch_route
app = FastAPI()
@function_tool
def echo(text: str) -> dict:
"""Routes the tool call through the Odyssey proxy."""
return proxy_call("echo", {"text": text})
def build_agent() -> Agent:
return Agent(
name="echo-agent",
instructions="When asked to echo text, call the echo tool and return its output.",
tools=[echo],
model="gpt-5",
)
@register_dispatch_route(app, agent_token_env="AGENT_TOKEN")
async def run(envelope):
result = await Runner.run(build_agent(), envelope.user_instruction)
return result.final_outputWhy this matters:
proxy_call(...)is the bridge that sends the agent's tool calls through to Odyssey simulation.register_dispatch_route(...)handles inbound auth, ping short-circuit, envelope parsing, and response shaping.build_agent()keeps your handler thin and aligned with SDK docs and templates.
1.2 Install and export local env
pip install 'pipelines-sdk[openai-agents]' uvicorn
export OPENAI_API_KEY=sk-...
export AGENT_TOKEN=$(python -c 'import secrets; print(secrets.token_urlsafe(32))')
echo $AGENT_TOKENStart the wrapper:
uvicorn app:app --port 80801.3 Register an external HTTP agent (with placeholder endpoint)
Pipelines blocks loopback and private addresses at registration time, so use a placeholder public URL first and let the dev tunnel command patch it during local iteration.
In Agents -> Register agent:
- Name:
quickstart-agent - Mode: External HTTP
- Endpoint URL:
https://example.com/dispatch(placeholder) - Auth header name:
Authorization - Auth header value:
Bearer <your AGENT_TOKEN> - Tools: paste the JSON structure, or dump it from your SDK agent declaration:
from pipelines.odyssey.adapters.openai_agents import dump_tools_schema_json
print(dump_tools_schema_json(build_agent()))[
{
"name": "echo",
"description": "Echoes the input text back.",
"input_schema": {
"type": "object",
"properties": { "text": { "type": "string" } },
"required": ["text"]
}
}
]1.4 Bridge localhost with pipelines odyssey dev
Keep uvicorn running, then in another terminal:
export PIPELINES_API_KEY=pk_live_...
pipelines odyssey dev --agent-id <id-of-quickstart-agent> --port 8080If successful, you will see a status line like:
Agent quickstart-agent (#42) is live at https://random-words-1234.trycloudflare.com/dispatch. Press Ctrl-C to stop.What this command does:
- Snapshots the current agent config.
- Starts
cloudflared(orngrokif requested). - Repoints the registered endpoint to the live tunnel
/dispatchURL. - Preserves existing auth header config.
- Reverts endpoint URL when you stop the command with
Ctrl-C.
See Local development for full flags and troubleshooting.
2) Build
Build steps turn the connected agent into a repeatable platform experiment.
2.1 Create a one-node pipeline with an agent field
- Open or create a project, then go to Pipelines -> Quick Create.
- In the wizard, pick
quickstart-agentas the agent under test. - Keep the default single agent field and finish the wizard.
- Click Create Pipeline on the final step. The wizard auto-publishes and takes you to Data Explorer.
Odyssey preconfigures seed columns for the agent field:
- User instruction column:
prompt - Behavior instructions column:
behavior - Initial state (JSON) column:
state - Failure rules (JSON) column:
failure_rules
2.2 Seed a baseline experiment row
In Data Explorer, click Create Tasks -> Generate synthetic seeds.
In the synthetic seeding flow, define a few buckets (scenario groups) with a name and target count, then generate the tasks. For this quickstart, keep it simple with one or two buckets and default world settings.
Tip: For broader experiment coverage, you can also ground synthetic generation with dataset-backed profiles derived from your real operating data. We will cover this in a dedicated page.
3) Test
Dispatch, inspect, and track experiment outcomes over agent variants. Verify agent performance in realistic scenarios before shipping to production.
3.1 Run and inspect the baseline
- Refresh Data Explorer and confirm Run status: completed with a judge badge.
- Open the row and click Agent Trace.
- Verify:
- trajectory includes the
echotool call - source is
odysseyfor baseline simulation - judge verdict and failure mode fields are populated
- trajectory includes the
If a run fails, common causes are:
- Missing
final_responsein handler output. - Auth mismatch between dashboard credential and local
AGENT_TOKEN. - Endpoint timing out before configured run timeout.
- Zero proxy calls because local proxy connectivity is broken.
See Inspecting runs for trace panel details.
3.2 Create a failure-injected variant for experiment tracking
Seed a second row with a deterministic failure rule:
prompt,behavior,state,failure_rules
"Use the echo tool to send back the word 'hello'.","You are an echo service.","{}","[{\"trigger\":\"after_n_calls\",\"tool\":\"echo\",\"n\":1,\"duration\":1,\"error\":{\"code\":503,\"message\":\"Echo offline\"}}]"Dispatch again, then compare baseline and variant rows in Data Explorer:
- baseline should show
odysseytool source and likely PASS - variant should show
injectedtool source and likely FAIL if unhandled - failure mode, verdict, and trajectory become your first experiment-tracking record
This baseline-vs-variant loop is the fastest way to track behavior changes as your agent evolves.