Pipelines Docs is in beta — content is actively being added.
AgentsRuntime Setup

Sandbox agent reference

Complete sandbox registration reference: code sources, entrypoints, environment, limits, errors, and API equivalents.

In the sidebar, click Agents → Register agent and pick the Sandbox Agents mode card. Visible only to Org Admins and Project Admin Owners. (For agents that run in your infrastructure, see Register an external HTTP agent.)

A sandbox agent's code runs inside a per-task platform sandbox. The How your agent runs field picks one of two execution paths:

PathWhat runsUse it for
Python functionThe platform imports your code and calls your entrypoint; tools go through the per-run proxyCustom Python agents, framework agents (OpenAI Agents SDK, LangGraph, …)
Shell command (any CLI)The platform runs your command inside a seeded repo workspaceCoding CLIs — Claude Code, Codex, Cursor, Aider. See Coding agents

Form fields

Name, description

  • Name (required, ≤ 255 chars).
  • Description (optional, ≤ 5000 chars).

Code source

The Code source picker offers four tiles. An agent declares exactly one (a Python-function agent must have one; a shell-command agent may ship none — its CLI is already in the image):

TileWhat it isLimits
Paste codeA single Python file pasted into the editor.≤ 200 KB.
Multiple filesA left-rail file tree (add files or drop a folder).≤ 50 files, 200 KB per file, 1 MB total. Relative POSIX paths, .py basenames, no .. or dotfile dirs.
Git repositoryA repo cloned into the sandbox at dispatch.URL + optional ref; private repos via a PAT (below).
Upload ZIPAn archive uploaded once and unzipped into the sandbox.≤ 100 MB compressed (500 MB uncompressed). Any files.

Wherever the code comes from, it lands in the agent directory /home/user/agent. When a coding scenario is attached, the graded repo lives in a separate /home/user/workspace, so your code never collides with the diffed repo.

Fetched sources (ZIP and Git) are resolved before any sandbox is booted, so a fetch failure costs nothing. Materialization is idempotent (a worker retry re-converges), and on later turns of a multi-turn session the already-populated sandbox is reused without re-downloading.

Upload ZIP

Use a ZIP when your agent is bigger than the paste limits or needs non-.py files. The flow:

  1. Drop or browse for a .zip. The form uploads it directly to storage with a short-lived signed URL, then confirms it. The form blocks submit while bytes are still moving.
  2. Only the confirmed archive id is saved on the agent — never the bytes.
  3. At each run, the platform downloads the archive, validates it (size caps, zip-slip and symlink guards), unzips it into /home/user/agent flattening a single top-level folder, then deletes the staged archive.

Uploading an archive requires Org Admin (or sys-admin) rights on the agent's org — project roles don't grant it. Archives are org-scoped and referenced only by the agent.

A ZIP's contents aren't known when you save the agent, so the Entrypoint file is shape-checked at save and its existence is verified inside the sandbox at dispatch. A wrong path fails the run, not the save.

Git repository

Provide a Repository URL and an optional Ref (optional) (branch, tag, or commit). The clone is https-only with an SSRF guard that rejects private/loopback/reserved hosts; SSH URLs and credentials embedded in the URL are rejected.

The Auth control has three modes:

  • None — public repo.
  • From credential — pick an existing org credential (a stored PAT). Resolved and decrypted server-side at dispatch.
  • Inline token — paste a PAT once. It's write-only: the platform stores it in a hidden, platform-managed credential and never writes the raw value to the agent config. In edit mode you see rotate-or-keep copy, never the token.

Strict checkout. A bad ref fails the run (agent_code_fetch_failed) — unlike the lenient scenario seed clone, agent-code git is strict. The token never appears in run output, errors, logs, or the stored config: it's injected only when the clone command is built, the remote is dropped after the clone, and the decrypted value is masked to *** everywhere — even if your agent echoes it back.

Entrypoint (Python function path)

Entrypoint (required, default run) is the name of a top-level callable in your code — a single Python identifier (dotted paths are rejected). For multi-file, ZIP, and Git sources, Entrypoint file selects which .py module holds it (default main.py). The platform calls it directly:

def run(task_input, *, proxy_url, run_token):
    # task_input is the dispatch input dict
    # proxy_url / run_token are also available as env vars (below)
    return {"final_response": "..."}        # or just a string

Return a {"final_response": ...} dict (optionally with messages / metadata), or a plain string the platform wraps as final_response. An unhandled exception is captured and graded, not treated as an infra failure.

Agent code in the sandbox cannot import the Pipelines SDK (it isn't installed there). Write SDK-free code: read proxy_url / run_token from the call kwargs or the PIPELINES_* env vars and call the per-run proxy over plain HTTP. See pipelines.odyssey for the SDK path when you control the runtime.

Run command (Shell command path)

A shell-command agent runs a Run command (a CLI harness or any program) inside the seeded workspace — no Python entrypoint. Preset chips fill it for Claude Code / Codex / Cursor / Aider. The platform writes the task as files and lets your command read them:

File / envWhat it points at
$PIPELINES_TASK_FILETASK.md — the task brief.
$PIPELINES_TASK_INPUT_FILEtask_input.json — the full task input.
$PIPELINES_RESULT_PATHresult.json your command may write; if it omits a final_response, the platform supplies one.

So claude -p "$(cat $PIPELINES_TASK_FILE)" is a complete command. This path requires a coding scenario on the task — registration completes without one, but dispatch fails with in_sandbox_requires_workspace. Full flow: Coding agents; CLI wiring: Harness customization.

Environment variables the platform injects

Your code may read these. The PIPELINES_ prefix is reserved — you can't declare your own env var or credential under it, and platform values always win on collision.

VariablePathUse
PIPELINES_ODYSSEY_PROXY_URLbothPer-run proxy base; append /tools/{name}.
PIPELINES_RUN_TOKENbothPer-run bearer for proxy/tool calls. Secret (redacted from logged env).
PIPELINES_RUN_TOKEN_JTIbothNon-secret correlation id; safe to log.
PIPELINES_API_URLbothPlatform API origin.
PIPELINES_AGENT_IDbothThis agent's id.
PIPELINES_TASK_ID / PIPELINES_RUN_IDbothNon-secret task / run ids.
_PIPELINES_TASK_INPUT_JSONPython functionJSON-encoded task input.
PIPELINES_TASK_FILE / PIPELINES_TASK_INPUT_FILE / PIPELINES_RESULT_PATHshell commandBrief / input / optional result paths (above).

For Python-function agents, proxy_url and run_token are also passed as keyword args, so SDK-free scripts can read either source.

Tools (optional)

Declare a tools_schema exactly as for external HTTP agents — see Tools schema. Python-function agents call tools through the per-run proxy; CLI agents get them via an injected MCP server (see Harness customization). Leave it empty for a pure compute or coding agent.

Sandbox environment (advanced)

Every run boots a managed sandbox from the platform default image — Python 3.13 with git, ripgrep, unzip, uv, and pytest preinstalled. The defaults work for most agents; open Sandbox environment (advanced) only if you need more. (For coding CLIs and their image guidance, see Coding agents.)

Boot-time layering

Applied per run when the sandbox boots — no persistent build:

  • System packages (one per line) — apt packages (≤ 50), installed as root once at boot, before your agent.
  • Setup command — a shell command run once at sandbox start, after package installs. It receives your resolved env (so it can use a credential-backed token). A nonzero exit fails the run (environment_setup_failed).
  • Python requirements (one per line) and Python version (3.9–3.13, blank = the default 3.13) — Python-function agents only. Requirements are pip-installed in the sandbox before your agent runs.

Custom Dockerfile

For heavier tooling, switch Base image to Custom Dockerfile for a persistent, built image. The platform always prepends FROM pipelines-workspace-base, so you write only the body. Constraints:

  • Only RUN, ENV, and WORKDIR directives. No COPY/ADD (there's no build context), no second FROM (single-stage), no ENTRYPOINT/CMD/USER.
  • ≤ 32 KB of Dockerfile text.

Saving stores the text; building is an explicit action. On the agent detail page, the Custom image card shows a status chip and a build button:

ChipMeaning
Not builtNo image yet. Click Build image.
Building…Build in flight; a live log streams.
ReadyBuilt. The button becomes Rebuild (force-rebuild).
Build failedBuild errored; the failure log tail is available. Rebuild to retry.

The button calls POST /api/agents/{id}/build-environment. A run with a custom Dockerfile is pinned to its built image: while Building… or Build failed, the run is a hard error — never a silent fall back to the default image. An identical Dockerfile already built in your org is reused without rebuilding; Rebuild forces a fresh build (the recovery hatch for a stuck or failed build). Only one build at a time per org — a second build returns a 409.

Environment variables and secrets

Sandbox agents declare Environment variables rows on the form. Each row is one of:

  • Value — a literal, stored in plaintext in the agent config. On-screen masking is cosmetic; treat these as non-secret config.
  • From credential — mapped to a stored org credential, decrypted only at dispatch and masked in all run output. This is the only encrypted path — put API keys and tokens here.

CLI provider keys (e.g. ANTHROPIC_API_KEY, OPENAI_API_KEY) go the same way: declare them as env-var rows backed by credentials. See Harness customization → Environment variables and secrets.

Env keys must be [A-Z][A-Z0-9_]*, ≤ 50 entries, ≤ 4096 chars per value, and must not use the reserved PIPELINES_ prefix. A missing or undecryptable credential fails the run as agent_secret_unresolved before any sandbox cost.

Concurrency cap, run timeout

Same fields as external HTTP agents: cap default 5 (1–100), timeout default 300 s (max 1800 s). Coding CLI runs routinely need several minutes — raise the timeout.

Pick simulator + judge models

Identical to external HTTP agents — set per agent field in the Pipeline Builder; see Register an external HTTP agent → models.

Limits and errors

LimitValue
Paste code200 KB
Multiple files50 files / 200 KB per file / 1 MB total
ZIP archive100 MB compressed / 500 MB uncompressed
System packages50 apt packages
Python requirements100 specs
Python version3.9–3.13 (default 3.13)
Dockerfile32 KB, RUN/ENV/WORKDIR only, single-stage
Env-var rows50 rows, 4096 chars per value
Concurrent image builds1 per org

These error chips surface in the run inspector when a code/env step fails:

Error ClassCase
agent_code_fetch_failedA ZIP or Git source couldn't be fetched, validated, or checked out (e.g. a bad ref). Fails before a sandbox boots.
agent_secret_unresolvedA From credential env var (or git PAT credential) is missing or can't be decrypted.
environment_setup_failedA Setup command exited nonzero (a half-built environment is a hard stop).
in_sandbox_requires_workspaceA shell-command agent was dispatched without a coding scenario.

API equivalent

The form posts mode: "code" to POST /api/agents. Single-file:

{
  "name": "my-sandbox-agent",
  "mode": "code",
  "config": {
    "source": "def run(task_input, *, proxy_url, run_token):\n    ...",
    "entrypoint": "run",
    "requirements": ["httpx", "anthropic"],
    "python_version": "3.12"
  }
}

Multi-file (source_files + entrypoint_file), ZIP, and git sources are mutually exclusive with source:

{
  "config": {
    "source_git": {
      "url": "https://github.com/acme/my-agent.git",
      "ref": "v1.2.0",
      "credential_type": "GITHUB_PAT"
    },
    "entrypoint_file": "main.py",
    "entrypoint": "run"
  }
}
{
  "config": {
    "source_zip_file_id": "<file UUID from the org files upload endpoint>",
    "entrypoint_file": "main.py",
    "entrypoint": "run"
  }
}

A coding CLI agent uses the in_sandbox topology and a run_command instead of an entrypoint:

{
  "config": {
    "execution_topology": "in_sandbox",
    "run_command": "claude -p \"$(cat $PIPELINES_TASK_FILE)\"",
    "credential_refs": { "ANTHROPIC_API_KEY": "ANTHROPIC_API_KEY" }
  }
}

execution_topology is "proxy" (default — Python function) or "in_sandbox" (shell command).

After registering

Wire the agent into a Pipeline Builder agent field, then seed tasks — plain task seeds for Python agents, coding scenarios for CLI agents — and read results in Inspecting runs. Or start from a runbook.