Troubleshooting
Failure-mode chips, their `error_class`, likely cause, and fix.
Error class reference
- auth_failed
- UI chip: auth failed
- Severity: danger
- Symptom: endpoint returns 401 or 403.
- Likely cause: auth header value in agent.config does not match wrapper expectation.
- Recommended fix: rotate token, update Edit agent, Auth header value, and re-dispatch.
- agent_timeout
- UI chip: timeout
- Severity: warning
- Symptom: 408, 504, or read timeout before run_timeout_s.
- Likely cause: wrapper or upstream model response exceeds configured timeout.
- Recommended fix: increase run_timeout_s or profile wrapper latency.
- agent_5xx
- UI chip: 5xx
- Severity: danger
- Symptom: endpoint returns 500 to 599, excluding 504.
- Likely cause: unhandled wrapper exception or upstream model 5xx.
- Recommended fix: inspect wrapper logs. First 500 characters of upstream response appear in error_json.detail.
- agent_4xx
- UI chip: 4xx
- Severity: amber
- Symptom: endpoint returns 4xx other than 401, 403, or 408.
- Likely cause: wrong path, invalid request envelope, or upstream rate limit.
- Recommended fix: verify endpoint_url dispatch path and validate envelope against the request envelope reference.
- agent_bad_response
- UI chip: bad response
- Severity: warning
- Symptom: 2xx response with non-JSON body.
- Likely cause: wrapper failed mid-flight or returned incomplete envelope.
- Recommended fix: confirm Content-Type application/json and non-empty final_response.
- agent_unreachable
- UI chip: unreachable
- Severity: danger
- Symptom: connection error or connection timeout.
- Likely cause: DNS failure, closed port, tunnel outage, or egress block.
- Recommended fix: run curl -v from an external host. For local development, use pipelines odyssey dev.
- proxy_misconfigured
- UI chip: proxy misconfigured
- Severity: amber
- Symptom: per-run proxy URL is missing or malformed.
- Likely cause: platform-side proxy configuration issue.
- Recommended fix: open support ticket with run_token_jti.
- transport_error
- UI chip: transport error
- Severity: warning
- Symptom: generic transport failure bucket.
- Likely cause: failure mode has no typed subtype yet.
- Recommended fix: inspect error_json.detail and file issue.
- contract_error
- UI chip: contract error
- Severity: danger
- Symptom: 2xx JSON response fails v1 contract validation.
- Likely cause: wrapper response shape is non-conformant.
- Recommended fix: validate against /schemas/agent-response.json. Parser reason appears in error_json.detail.
- internal_error
- UI chip: internal error
- Severity: danger
- Symptom: platform failed before dispatch completion.
- Likely cause: rare platform-side dispatch failure.
- Recommended fix: open support ticket with run id and run_token_jti.
- agent_model_unresolved
- UI chip: model unresolved
- Severity: amber
- Symptom: no simulator or judge model is resolvable for the run org.
- Likely cause: field-level model selection is set to org default, and org default model is unset.
- Recommended fix: select model in field Models popover or configure org default in Settings, Models, then re-dispatch.
Severity semantics:
- danger: direct intervention is required on agent or integration path.
- warning: retry is usually appropriate; failure may be transient.
- amber: integration drift or configuration mismatch.
Automatic re-dispatch on transient failures
Transport-class failures — agent_unreachable, agent_timeout,
agent_5xx, and the generic transport_error — are often a momentary
blip (a tunnel that dropped its session mid-batch, a brief upstream 5xx,
a DNS hiccup) rather than a real defect. The platform automatically
re-dispatches a fresh run for these classes up to twice, with
exponential backoff, before surfacing the failure. Each attempt is its
own run row, so the trace history shows the retries. Classes that won't
fix themselves on retry — auth_failed, agent_4xx, contract_error,
proxy_misconfigured, and the rest — fail immediately with no retry.
Coding-layer error class reference
These classes occur only on code agents, including sandboxed code and CLI agents, and most appear on coding runs with seeded repository workspace. They do not currently have dedicated failure-mode chips. Failed rows show raw error_class values under generic palette styling.
- agent_code_fetch_failed
- Symptom: run fails while fetching code source.
- Likely cause: git or zip code source cannot be resolved or materialized, for example missing file, invalid archive, clone failure, or missing entrypoint.
- Recommended fix: reconfirm uploaded zip or verify git URL, branch, and code PAT validity. See Sandbox agent reference.
- agent_secret_unresolved
- Symptom: run fails before dispatch.
- Likely cause: config.credential_refs points to missing org credential.
- Recommended fix: re-create credential or remove stale env and credential_refs entries.
- in_sandbox_requires_workspace
- Symptom: in_sandbox CLI agent fails immediately.
- Likely cause: seed is not coding scenario, so no graded workspace exists.
- Recommended fix: seed from coding scenario in workspace mode, or switch to proxy topology.
- workspace_requires_code_agent
- Symptom: coding workspace scenario fails on external HTTP agent.
- Likely cause: external HTTP agents cannot execute coding workspace scenarios in graded sandbox.
- Recommended fix: register as code agent and re-seed.
- workspace_seed_failed
- Symptom: failure during repository workspace seeding.
- Likely cause: repository clone or unzip fails due to invalid URL, host, scheme, subdir, or archive validation.
- Recommended fix: validate workspace seed configuration. See Coding scenarios.
- workspace_setup_failed
- Symptom: failure after seeding, during setup phase.
- Likely cause: scenario platform-mode setup commands fail.
- Recommended fix: reproduce setup commands locally and correct failures.
- environment_setup_failed
- Symptom: failure while preparing runtime environment.
- Likely cause: per-agent environment configuration is invalid or partially built.
- Recommended fix: inspect error detail and correct environment definition. See Sandbox agent reference.
- image_build_failed
- Symptom: failure resolving custom runtime image.
- Likely cause: custom image build failed.
- Recommended fix: rebuild image and fix Dockerfile issues before dispatch.
- image_not_ready
- Symptom: failure resolving custom runtime image.
- Likely cause: custom image declared but not built or not ready.
- Recommended fix: build image to ready state, then re-dispatch.
An in_sandbox CLI run that exits non-zero and leaves no workspace changes is treated as failure and filed under transport_error with an agent_command_failed prefix in error detail. A non-zero exit with real diff is not automatically failing. Final outcome is determined by scorers and judge.
Reading the evidence
The trace tab Debug section exposes:
- The raw response body (or the platform's observed error string).
- The error_json object, source of error_class.
- For agent_5xx and agent_4xx, up to 500 characters of upstream response.
For agent_unreachable where wrapper health appears normal, run the outbound curl from Register an agent → Network reachability from inside agent runtime. HTTP 401 from proxy indicates egress is working. Connection errors indicate runtime cannot reach proxy.