Coding scenarios & workspaces
Define a task and grading recipe, and run a sandboxed coding agent against a real git workspace.
A coding scenario is a reusable definition for sandboxed coding tasks. It specifies the starting repository (workspace seed), the task to perform (instruction and setup), and the grading policy (scorers). When a task is seeded from a scenario, the platform materializes a git-backed working tree inside the run's sandbox, commits a baseline, lets the agent edit files, then grades the diff.
Scenarios apply only to code agents. A workspace seed on an External HTTP agent is rejected because execution occurs on your own infrastructure, not in the graded sandbox. See Coding agents for the agent side, and Register a sandbox agent for custom images and agent code.
Frozen at seed time
When a task is created from a scenario, the scenario definition is deep-copied and frozen into that task seed. There is no scenario version history by design. Reproducibility is provided by the per-task snapshot.
Editing a scenario does not change already-seeded tasks. The frozen copy on each task is authoritative, so updates affect only tasks seeded after the edit. Re-seed to apply changes.
What a scenario defines
Each scenario uses the same five-part definition.
| Field | Required | Purpose |
|---|---|---|
| user_instruction | optional | Default task prompt provided to the agent. A per-row CSV value overrides this default. |
| workspace_seed | required | Source of the working tree (git, archive, or empty). |
| setup | optional | Repository preparation method before execution (platform command or agent instructions). |
| scorers | optional | Grading checks evaluated against the agent diff. |
| e2b_template | optional, advanced | Sandbox template alias used at boot. No builder UI is available; see E2B template. |
Workspace seed
The Workspace section defines the source of the agent working tree:
- Clone a repository (source: "git"): requires a non-empty Git URL. The repository is cloned into the sandbox during seed.
- Upload a repo .zip (source: "archive"): mounts an uploaded ZIP archive. Archive upload is project-scoped and is available only when the scenario is configured from within a workflow (see Archive uploads).
- Start from an empty repo (source: "empty"): creates a blank working directory for from-scratch tasks where the agent creates all files.
For a git source you can also set:
- Ref (optional): branch, tag, or commit. Defaults to the repository default branch.
- Auth: None, or From org credential to clone a private repository. The org credential is decrypted at seed time and used for the clone only; the token is never stored in the scenario or in the workspace.
- Subdirectory (optional): scopes execution to a repository subpath.
A bad ref silently falls back to the default branch. During seeding, a failed checkout of Ref does not fail the task. The clone remains on the repository default branch and the run proceeds. Verify branch, tag, or commit spelling, because a missing ref does not raise an error.
Evaluation is always repository-wide, including when Subdirectory is set. Agent execution, setup, and tools are scoped to the subtree, but the baseline commit and final graded diff are computed at repository root. Writes outside the subtree are still captured and graded.
Setup
The Setup section defines repository preparation before agent execution. Leave it blank to skip. The two setup modes differ in whether setup work appears in the graded diff:
| Mode | Field | Runs | In the agent's diff? |
|---|---|---|---|
| Platform runs a command | Setup command | A shell command the platform runs before the baseline commit. | No. Setup artifacts are included in baseline and excluded from the diff. |
| Agent sets up | Setup instructions | No platform command. Instructions are sent to the agent, which prepares the workspace. | Yes. Agent setup work is included in the graded diff. |
Use platform mode for environment preparation that should not be graded, such as dependency installation, build steps, or fixture generation. Use agent mode when setup behavior is part of the task objective. A non-zero platform setup exit is an infrastructure error; agent dispatch does not start and the run is not graded.
Scorers
The Scorers section lists checks that grade the agent diff. If a required scorer fails, the task fails. Full details are on Scorers & grading.
E2B template (advanced)
e2b_template is the sandbox template alias used at workspace boot. There is no builder field for this value. The template is resolved automatically from the agent built custom image or a platform default, so it is typically set via the API for advanced cases.
Creating and managing scenarios
In the UI
Open the agent-mode field Coding setup popover in the Pipeline Builder (code agents only). It includes Workspace, Setup, Scorers, and optional Instruction, and attaches the scenario inline to that field. Saving applies it as the field's default workspace seed for every task created from the workflow.
API
The form-backed entity is an org-scoped scenario library exposed at /api/coding-scenarios. Use this API directly for scripted or infrastructure-as-code seeding.
| Method | Path | Notes |
|---|---|---|
| POST | /api/coding-scenarios | Creates a scenario. Returns 201. Returns 409 if the name already exists in the org. |
| GET | /api/coding-scenarios | Lists scenarios in the org. Add ?include_archived=true to include archived entries. |
| GET | /api/coding-scenarios plus scenario id path parameter | Fetches one scenario by id. |
| PUT | /api/coding-scenarios plus scenario id path parameter | Updates name, description, or definition by id. Returns 409 on name collision. |
| DELETE | /api/coding-scenarios plus scenario id path parameter | Smart delete behavior by id, described below. |
Scenario names are unique within an org and must be 1 to 256 characters, starting with an alphanumeric character. Allowed characters are alphanumerics, space, underscore, period, and hyphen. The full definition is returned on every read and is not write-only.
Delete archives when tasks still reference the scenario. DELETE performs a hard delete only if no task was seeded from that scenario. If at least one task still references it, the scenario is archived instead and hidden from default list responses. Use ?include_archived=true on GET to locate archived scenarios.
Archive uploads
When the workspace source is Upload a repo .zip, the archive is validated on the server before any bytes reach a sandbox. The archive is rejected when it:
- exceeds 100 MB compressed, or 500 MB uncompressed (zip bomb guard);
- contains absolute paths or .. path traversal;
- contains a symlink entry.
Archive upload is project-scoped, so the Upload a repo .zip option is only enabled when you configure the scenario from inside a workflow.
Attaching scenarios to tasks
A scenario reaches execution through the agent field and per-row CSV axes, using the same way the task seed axes work:
-
Default scenario on the field. The Coding setup popover attaches a scenario to the agent-mode field, so every task seeded from the workflow gets that workspace seed unless a row overrides it.
-
Per-row workspace axes (API only). On top of the five ledger axes, the seeding service accepts per-row workspace CSV columns when the agent field wires them in agentConfig.odyssey_seed_columns via the workflow API. The builder UI doesn't expose toggles for these axes:
Axis CSV column Purpose scenario_ref scenario_ref Name of a saved scenario used to seed this row. workspace_seed workspace_seed Per-row override of workspace source (git, archive, or empty), including url, ref, and subdir fields. setup_command setup_command Per-row platform setup command. eval eval Per-row evaluation configuration, including scorer lists.
A per-row scenario_ref value overrides the field default scenario. Per-row axis cells override scenario defaults. The scenario definition fills only missing gaps.
Blank or malformed workspace cells fail the row. The workspace-critical axes (scenario_ref, workspace_seed, eval) are hard-error columns. A present-but-blank or malformed cell fails that row instead of silently downgrading a coding task to a non-workspace run. Omitting the column entirely is fine.
For the five ledger axes and how all of this is frozen into the seed, see Task seeding.
The workspace at run time
When a workspace-mode task runs, the platform:
- Boots a sandbox and clones, unpacks, or initializes the repository at a fixed path (default /home/user/workspace, or a subdirectory path when set).
- Runs any platform setup command.
- Commits a baseline (a diffable tag named odyssey-baseline) so the final diff captures only agent changes. Platform setup artifacts are already inside the baseline and excluded.
- Dispatches the agent, which edits files in the working tree.
- Captures the cumulative diff against the baseline and runs the scorers.
The platform owns every commit; the agent only edits files. For multi-turn coding sessions the diff is cumulative across all turns against the same baseline.
Secrets must be provided as environment variables, not files in the workspace. Every baseline commits all files under workspace root, so a secret-bearing file there would land in git history (and the diff). Put secrets in org credentials referenced as environment variables.
The workspace-eval error banner
Workspace evaluation, which captures the diff and runs scorers, is independent of grading. If this phase fails, the run still completes; the mechanical trajectory survives, and the run inspector shows a banner reading Eval phase failed, diff/scorers may be missing, with the redacted cause in a collapsible section. Treat this as a signal that diff and scorer outputs may be partial or absent, not as a failed run.