Pipelines Docs is in beta — content is actively being added.
Getting Started

Agent Quickstart

Workflow for Evaluating Agents with Pipelines

This quickstart is organized into three steps:

  1. Connect: wire local agent to Pipelines, including tunnel and auth.
  2. Build: configure the platform workflow and seed inputs.
  3. Test: run experiments, inspect trajectories, and track outcomes.

Prerequisites

  • Python 3.10+ with pip.
  • An OPENAI_API_KEY (or swap to Anthropic, Strands, or scratch later; see reference templates).
  • A tunnel binary on your PATH. Use cloudflared (recommended) or ngrok.
  • A Pipelines account with Project Admin (owner) or Org Admin permissions (Agents are an admin feature). Mint a pk_live_... API key from Settings -> API Keys.

1) Connect

Connection handles all local-to-platform plumbing: dispatch auth, per-run proxy routing, and the public tunnel needed by registration reachability gates.

1.1 Wrap the agent with the SDK pattern

Below is an example of the SDK's canonical shape from Agent SDK: proxied tools, a build_agent() factory, and a dispatch handler mounted by register_dispatch_route.

Save as app.py:

from agents import Agent, Runner, function_tool
from fastapi import FastAPI

from pipelines.odyssey import proxy_call, register_dispatch_route

app = FastAPI()


@function_tool
def echo(text: str) -> dict:
    """Routes the tool call through the Odyssey proxy."""
    return proxy_call("echo", {"text": text})


def build_agent() -> Agent:
    return Agent(
        name="echo-agent",
        instructions="When asked to echo text, call the echo tool and return its output.",
        tools=[echo],
        model="gpt-5",
    )


@register_dispatch_route(app, agent_token_env="AGENT_TOKEN")
async def run(envelope):
    result = await Runner.run(build_agent(), envelope.user_instruction)
    return result.final_output

Why this matters:

  • proxy_call(...) is the bridge that sends the agent's tool calls through to Odyssey simulation.
  • register_dispatch_route(...) handles inbound auth, ping short-circuit, envelope parsing, and response shaping.
  • build_agent() keeps your handler thin and aligned with SDK docs and templates.

1.2 Install and export local env

pip install 'pipelines-sdk[openai-agents]' uvicorn

export OPENAI_API_KEY=sk-...
export AGENT_TOKEN=$(python -c 'import secrets; print(secrets.token_urlsafe(32))')
echo $AGENT_TOKEN

Start the wrapper:

uvicorn app:app --port 8080

1.3 Register an external HTTP agent (with placeholder endpoint)

Pipelines blocks loopback and private addresses at registration time, so use a placeholder public URL first and let the dev tunnel command patch it during local iteration.

In Agents -> Register agent:

  • Name: quickstart-agent
  • Mode: External HTTP
  • Endpoint URL: https://example.com/dispatch (placeholder)
  • Auth header name: Authorization
  • Auth header value: Bearer <your AGENT_TOKEN>
  • Tools: paste the JSON structure, or dump it from your SDK agent declaration:
from pipelines.odyssey.adapters.openai_agents import dump_tools_schema_json
print(dump_tools_schema_json(build_agent()))
[
  {
    "name": "echo",
    "description": "Echoes the input text back.",
    "input_schema": {
      "type": "object",
      "properties": { "text": { "type": "string" } },
      "required": ["text"]
    }
  }
]

1.4 Bridge localhost with pipelines odyssey dev

Keep uvicorn running, then in another terminal:

export PIPELINES_API_KEY=pk_live_...
pipelines odyssey dev --agent-id <id-of-quickstart-agent> --port 8080

If successful, you will see a status line like:

Agent quickstart-agent (#42) is live at https://random-words-1234.trycloudflare.com/dispatch. Press Ctrl-C to stop.

What this command does:

  • Snapshots the current agent config.
  • Starts cloudflared (or ngrok if requested).
  • Repoints the registered endpoint to the live tunnel /dispatch URL.
  • Preserves existing auth header config.
  • Reverts endpoint URL when you stop the command with Ctrl-C.

See Local development for full flags and troubleshooting.

2) Build

Build steps turn the connected agent into a repeatable platform experiment.

2.1 Create a one-node pipeline with an agent field

  1. Open or create a project, then go to Pipelines -> Quick Create.
  2. In the wizard, pick quickstart-agent as the agent under test.
  3. Keep the default single agent field and finish the wizard.
  4. Click Create Pipeline on the final step. The wizard auto-publishes and takes you to Data Explorer.

Odyssey preconfigures seed columns for the agent field:

  • User instruction column: prompt
  • Behavior instructions column: behavior
  • Initial state (JSON) column: state
  • Failure rules (JSON) column: failure_rules

2.2 Seed a baseline experiment row

In Data Explorer, click Create Tasks -> Generate synthetic seeds.

In the synthetic seeding flow, define a few buckets (scenario groups) with a name and target count, then generate the tasks. For this quickstart, keep it simple with one or two buckets and default world settings.

Tip: For broader experiment coverage, you can also ground synthetic generation with dataset-backed profiles derived from your real operating data. We will cover this in a dedicated page.

3) Test

Dispatch, inspect, and track experiment outcomes over agent variants. Verify agent performance in realistic scenarios before shipping to production.

3.1 Run and inspect the baseline

  1. Refresh Data Explorer and confirm Run status: completed with a judge badge.
  2. Open the row and click Agent Trace.
  3. Verify:
    • trajectory includes the echo tool call
    • source is odyssey for baseline simulation
    • judge verdict and failure mode fields are populated

If a run fails, common causes are:

  • Missing final_response in handler output.
  • Auth mismatch between dashboard credential and local AGENT_TOKEN.
  • Endpoint timing out before configured run timeout.
  • Zero proxy calls because local proxy connectivity is broken.

See Inspecting runs for trace panel details.

3.2 Create a failure-injected variant for experiment tracking

Seed a second row with a deterministic failure rule:

prompt,behavior,state,failure_rules
"Use the echo tool to send back the word 'hello'.","You are an echo service.","{}","[{\"trigger\":\"after_n_calls\",\"tool\":\"echo\",\"n\":1,\"duration\":1,\"error\":{\"code\":503,\"message\":\"Echo offline\"}}]"

Dispatch again, then compare baseline and variant rows in Data Explorer:

  • baseline should show odyssey tool source and likely PASS
  • variant should show injected tool source and likely FAIL if unhandled
  • failure mode, verdict, and trajectory become your first experiment-tracking record

This baseline-vs-variant loop is the fastest way to track behavior changes as your agent evolves.