Task Lifecycle
Where human evaluation work on agent runs shows up, and how it flows through a pipeline.
A task is a single unit of work that moves through a pipeline. In the agent-testing model, each dataset row becomes one (agent, task) run: the agent executes, its tool calls route through the Odyssey proxy, and an LLM judge grades the trajectory. Tasks become visible work for people when a pipeline asks a human to evaluate or review an agent's output — for example when it has human-eval fields (an agent-mode field paired with a human grade) or a review step. Those are the moments that surface as tasks for contributors and reviewers to pick up.
Each task creates node instances at every step it reaches, and each node instance has its own status.
Task states
| Status | Description |
|---|---|
| Pending | Created but not yet available for work. |
| In Progress | At least one node instance is actively being worked on. |
| Paused | The pipeline was paused. The task resumes when the pipeline is reactivated. |
| Finished | All nodes completed successfully — the task has reached an End node. |
| Escalated | Flagged for admin attention due to review failures exceeding limits. |
| Quarantined | Held because the task exceeded the maximum number of re-reviews. |
| Failed | An unrecoverable error occurred during processing. |
Node instance states
| Status | Description |
|---|---|
| Pending | The task has not yet reached this node. |
| Available | Ready to be claimed by a contributor or processed by an LLM. |
| Claimed | A contributor has claimed this node, or an LLM is generating a response. |
| Submitted | Work has been submitted, awaiting the next step. |
| Finished | The node is complete. |
| Escalated | The node has been escalated to an admin. |
| Quarantined | The node is held due to re-review limits. |
| N/A | The node was skipped (e.g., a logic gate routed the task down a different path). |
| Deferred | The node is waiting for an upstream dependency. |
Task flow
A typical task lifecycle:
- Created — a task is created from a dataset row and enters the pipeline at the Start node, pairing the row with the agent under test.
- Available — the first node becomes available. An agent run or LLM step is processed automatically; a human-eval or review step becomes available for a contributor to claim.
- Claimed — a contributor claims the human-eval step from their work queue, or the agent/LLM begins executing.
- Submitted — the contributor records their grade and submits, or the agent run / LLM generation finishes.
- Reviewed (if applicable) — the task moves to a review node where a reviewer passes or fails the submission. A failed review sends the task back for rework.
- Finished — the task reaches an End node and is marked as Finished.
Contributors and reviewers pick up the human-eval steps from their work queue — claiming a step to evaluate or review an agent's output — but most of the lifecycle runs on its own as the agent executes and the judge grades the trajectory.
Viewing tasks
Data Explorer
The primary interface for viewing and managing tasks is the Data Explorer, where you inspect agent traces in depth. It is accessible from the pipeline page; the header shows the task completion count and provides access to task creation, export, and creating derived columns. To drill into a single agent's trajectory and tool calls, see inspecting runs.
The Data Explorer has four tabs:
- Task Metadata — task-level information: status, current node, assignee, timestamps, and per-node status columns.
- Task Data — all form field values across nodes, including LLM responses and evaluations.
- Evaluation Analytics — scorecard and per-field analytics for evaluation results (shown when the pipeline has evaluation fields).
- LLM Analytics — LLM generation metrics (shown when the pipeline has LLM nodes).
Task detail
Click View on any task row to open a detail modal with two views:
- Nodes view — per-node breakdown of form fields, submitted values, review data, and LLM output.
- Timeline view — chronological history of all submissions, reviews, and state changes for the task.
Admin operations
Admins can perform the following operations on tasks:
- View any task — open the full submission and review data.
- Edit submissions — modify submitted field values.
- Release — unclaim a task, returning it to the available queue.
- Escalate/De-escalate — manually change escalation status.
- Tag — apply tags to tasks for filtering and organization.
Bulk actions
Select multiple tasks in the Data Explorer to:
- Export — download selected tasks as CSV, JSON, or ZIP.
- Evaluate — run evaluations on selected tasks.
- Delete — permanently remove selected tasks.