Sandbox agent reference
Complete sandbox registration reference: code sources, entrypoints, environment, limits, errors, and API equivalents.
In the sidebar, click Agents → Register agent and pick the Sandbox Agents mode card. Visible only to Org Admins and Project Admin Owners. (For agents that run in your infrastructure, see Register an external HTTP agent.)
A sandbox agent's code runs inside a per-task platform sandbox. The How your agent runs field picks one of two execution paths:
| Path | What runs | Use it for |
|---|---|---|
| Python function | The platform imports your code and calls your entrypoint; tools go through the per-run proxy | Custom Python agents, framework agents (OpenAI Agents SDK, LangGraph, …) |
| Shell command (any CLI) | The platform runs your command inside a seeded repo workspace | Coding CLIs — Claude Code, Codex, Cursor, Aider. See Coding agents |
Form fields
Name, description
- Name (required, ≤ 255 chars).
- Description (optional, ≤ 5000 chars).
Code source
The Code source picker offers four tiles. An agent declares exactly one (a Python-function agent must have one; a shell-command agent may ship none — its CLI is already in the image):
| Tile | What it is | Limits |
|---|---|---|
| Paste code | A single Python file pasted into the editor. | ≤ 200 KB. |
| Multiple files | A left-rail file tree (add files or drop a folder). | ≤ 50 files, 200 KB per file, 1 MB total. Relative POSIX paths, .py basenames, no .. or dotfile dirs. |
| Git repository | A repo cloned into the sandbox at dispatch. | URL + optional ref; private repos via a PAT (below). |
| Upload ZIP | An archive uploaded once and unzipped into the sandbox. | ≤ 100 MB compressed (500 MB uncompressed). Any files. |
Wherever the code comes from, it lands in the agent directory
/home/user/agent. When a coding scenario is attached, the graded repo
lives in a separate /home/user/workspace, so your code never collides
with the diffed repo.
Fetched sources (ZIP and Git) are resolved before any sandbox is booted, so a fetch failure costs nothing. Materialization is idempotent (a worker retry re-converges), and on later turns of a multi-turn session the already-populated sandbox is reused without re-downloading.
Upload ZIP
Use a ZIP when your agent is bigger than the paste limits or needs
non-.py files. The flow:
- Drop or browse for a
.zip. The form uploads it directly to storage with a short-lived signed URL, then confirms it. The form blocks submit while bytes are still moving. - Only the confirmed archive id is saved on the agent — never the bytes.
- At each run, the platform downloads the archive, validates it (size
caps, zip-slip and symlink guards), unzips it into
/home/user/agentflattening a single top-level folder, then deletes the staged archive.
Uploading an archive requires Org Admin (or sys-admin) rights on the agent's org — project roles don't grant it. Archives are org-scoped and referenced only by the agent.
A ZIP's contents aren't known when you save the agent, so the Entrypoint file is shape-checked at save and its existence is verified inside the sandbox at dispatch. A wrong path fails the run, not the save.
Git repository
Provide a Repository URL and an optional Ref (optional) (branch, tag, or commit). The clone is https-only with an SSRF guard that rejects private/loopback/reserved hosts; SSH URLs and credentials embedded in the URL are rejected.
The Auth control has three modes:
- None — public repo.
- From credential — pick an existing org credential (a stored PAT). Resolved and decrypted server-side at dispatch.
- Inline token — paste a PAT once. It's write-only: the platform stores it in a hidden, platform-managed credential and never writes the raw value to the agent config. In edit mode you see rotate-or-keep copy, never the token.
Strict checkout. A bad ref fails the run (agent_code_fetch_failed) —
unlike the lenient scenario seed clone, agent-code git is strict. The token
never appears in run output, errors, logs, or the stored config: it's
injected only when the clone command is built, the remote is dropped after
the clone, and the decrypted value is masked to *** everywhere — even if
your agent echoes it back.
Entrypoint (Python function path)
Entrypoint (required, default run) is the name of a top-level
callable in your code — a single Python identifier (dotted paths are
rejected). For multi-file, ZIP, and Git sources, Entrypoint file
selects which .py module holds it (default main.py). The platform
calls it directly:
def run(task_input, *, proxy_url, run_token):
# task_input is the dispatch input dict
# proxy_url / run_token are also available as env vars (below)
return {"final_response": "..."} # or just a stringReturn a {"final_response": ...} dict (optionally with messages /
metadata), or a plain string the platform wraps as final_response. An
unhandled exception is captured and graded, not treated as an infra
failure.
Agent code in the sandbox cannot import the Pipelines SDK (it isn't
installed there). Write SDK-free code: read proxy_url / run_token from
the call kwargs or the PIPELINES_* env vars and call the per-run proxy
over plain HTTP. See pipelines.odyssey
for the SDK path when you control the runtime.
Run command (Shell command path)
A shell-command agent runs a Run command (a CLI harness or any program) inside the seeded workspace — no Python entrypoint. Preset chips fill it for Claude Code / Codex / Cursor / Aider. The platform writes the task as files and lets your command read them:
| File / env | What it points at |
|---|---|
$PIPELINES_TASK_FILE | TASK.md — the task brief. |
$PIPELINES_TASK_INPUT_FILE | task_input.json — the full task input. |
$PIPELINES_RESULT_PATH | result.json your command may write; if it omits a final_response, the platform supplies one. |
So claude -p "$(cat $PIPELINES_TASK_FILE)" is a complete command. This
path requires a coding scenario on the task — registration completes
without one, but dispatch fails with in_sandbox_requires_workspace.
Full flow: Coding agents; CLI
wiring: Harness customization.
Environment variables the platform injects
Your code may read these. The PIPELINES_ prefix is reserved — you
can't declare your own env var or credential under it, and platform values
always win on collision.
| Variable | Path | Use |
|---|---|---|
PIPELINES_ODYSSEY_PROXY_URL | both | Per-run proxy base; append /tools/{name}. |
PIPELINES_RUN_TOKEN | both | Per-run bearer for proxy/tool calls. Secret (redacted from logged env). |
PIPELINES_RUN_TOKEN_JTI | both | Non-secret correlation id; safe to log. |
PIPELINES_API_URL | both | Platform API origin. |
PIPELINES_AGENT_ID | both | This agent's id. |
PIPELINES_TASK_ID / PIPELINES_RUN_ID | both | Non-secret task / run ids. |
_PIPELINES_TASK_INPUT_JSON | Python function | JSON-encoded task input. |
PIPELINES_TASK_FILE / PIPELINES_TASK_INPUT_FILE / PIPELINES_RESULT_PATH | shell command | Brief / input / optional result paths (above). |
For Python-function agents, proxy_url and run_token are also passed as
keyword args, so SDK-free scripts can read either source.
Tools (optional)
Declare a tools_schema exactly as for external HTTP agents — see
Tools schema. Python-function agents
call tools through the per-run proxy; CLI agents get them via an injected
MCP server (see
Harness customization). Leave it
empty for a pure compute or coding agent.
Sandbox environment (advanced)
Every run boots a managed sandbox from the platform default image —
Python 3.13 with git, ripgrep, unzip, uv, and pytest
preinstalled. The defaults work for most agents; open Sandbox
environment (advanced) only if you need more. (For coding CLIs and their
image guidance, see Coding agents.)
Boot-time layering
Applied per run when the sandbox boots — no persistent build:
- System packages (one per line) — apt packages (≤ 50), installed as root once at boot, before your agent.
- Setup command — a shell command run once at sandbox start, after
package installs. It receives your resolved env (so it can use a
credential-backed token). A nonzero exit fails the run
(
environment_setup_failed). - Python requirements (one per line) and Python version (3.9–3.13, blank = the default 3.13) — Python-function agents only. Requirements are pip-installed in the sandbox before your agent runs.
Custom Dockerfile
For heavier tooling, switch Base image to Custom Dockerfile for a
persistent, built image. The platform always prepends
FROM pipelines-workspace-base, so you write only the body. Constraints:
- Only
RUN,ENV, andWORKDIRdirectives. NoCOPY/ADD(there's no build context), no secondFROM(single-stage), noENTRYPOINT/CMD/USER. - ≤ 32 KB of Dockerfile text.
Saving stores the text; building is an explicit action. On the agent detail page, the Custom image card shows a status chip and a build button:
| Chip | Meaning |
|---|---|
| Not built | No image yet. Click Build image. |
| Building… | Build in flight; a live log streams. |
| Ready | Built. The button becomes Rebuild (force-rebuild). |
| Build failed | Build errored; the failure log tail is available. Rebuild to retry. |
The button calls POST /api/agents/{id}/build-environment. A run with a
custom Dockerfile is pinned to its built image: while Building… or
Build failed, the run is a hard error — never a silent fall back to
the default image. An identical Dockerfile already built in your org is
reused without rebuilding; Rebuild forces a fresh build (the recovery
hatch for a stuck or failed build). Only one build at a time per org —
a second build returns a 409.
Environment variables and secrets
Sandbox agents declare Environment variables rows on the form. Each row is one of:
- Value — a literal, stored in plaintext in the agent config. On-screen masking is cosmetic; treat these as non-secret config.
- From credential — mapped to a stored org credential, decrypted only at dispatch and masked in all run output. This is the only encrypted path — put API keys and tokens here.
CLI provider keys (e.g. ANTHROPIC_API_KEY, OPENAI_API_KEY) go the same
way: declare them as env-var rows backed by credentials. See
Harness customization → Environment variables and secrets.
Env keys must be [A-Z][A-Z0-9_]*, ≤ 50 entries, ≤ 4096 chars per value,
and must not use the reserved PIPELINES_ prefix. A missing or
undecryptable credential fails the run as agent_secret_unresolved before
any sandbox cost.
Concurrency cap, run timeout
Same fields as external HTTP agents: cap default 5 (1–100), timeout default 300 s (max 1800 s). Coding CLI runs routinely need several minutes — raise the timeout.
Pick simulator + judge models
Identical to external HTTP agents — set per agent field in the Pipeline Builder; see Register an external HTTP agent → models.
Limits and errors
| Limit | Value |
|---|---|
| Paste code | 200 KB |
| Multiple files | 50 files / 200 KB per file / 1 MB total |
| ZIP archive | 100 MB compressed / 500 MB uncompressed |
| System packages | 50 apt packages |
| Python requirements | 100 specs |
| Python version | 3.9–3.13 (default 3.13) |
| Dockerfile | 32 KB, RUN/ENV/WORKDIR only, single-stage |
| Env-var rows | 50 rows, 4096 chars per value |
| Concurrent image builds | 1 per org |
These error chips surface in the run inspector when a code/env step fails:
| Error Class | Case |
|---|---|
agent_code_fetch_failed | A ZIP or Git source couldn't be fetched, validated, or checked out (e.g. a bad ref). Fails before a sandbox boots. |
agent_secret_unresolved | A From credential env var (or git PAT credential) is missing or can't be decrypted. |
environment_setup_failed | A Setup command exited nonzero (a half-built environment is a hard stop). |
in_sandbox_requires_workspace | A shell-command agent was dispatched without a coding scenario. |
API equivalent
The form posts mode: "code" to POST /api/agents. Single-file:
{
"name": "my-sandbox-agent",
"mode": "code",
"config": {
"source": "def run(task_input, *, proxy_url, run_token):\n ...",
"entrypoint": "run",
"requirements": ["httpx", "anthropic"],
"python_version": "3.12"
}
}Multi-file (source_files + entrypoint_file), ZIP, and git sources are
mutually exclusive with source:
{
"config": {
"source_git": {
"url": "https://github.com/acme/my-agent.git",
"ref": "v1.2.0",
"credential_type": "GITHUB_PAT"
},
"entrypoint_file": "main.py",
"entrypoint": "run"
}
}{
"config": {
"source_zip_file_id": "<file UUID from the org files upload endpoint>",
"entrypoint_file": "main.py",
"entrypoint": "run"
}
}A coding CLI agent uses the in_sandbox topology and a run_command
instead of an entrypoint:
{
"config": {
"execution_topology": "in_sandbox",
"run_command": "claude -p \"$(cat $PIPELINES_TASK_FILE)\"",
"credential_refs": { "ANTHROPIC_API_KEY": "ANTHROPIC_API_KEY" }
}
}execution_topology is "proxy" (default — Python function) or
"in_sandbox" (shell command).
After registering
Wire the agent into a Pipeline Builder agent field, then seed tasks — plain task seeds for Python agents, coding scenarios for CLI agents — and read results in Inspecting runs. Or start from a runbook.