Task Lifecycle

A task is a single unit of work that moves through a pipeline. In the agent-testing model, each dataset row becomes one (agent, task) run: the agent executes, its tool calls route through the Odyssey proxy, and an LLM judge grades the trajectory. Tasks become visible work for people when a pipeline asks a human to evaluate or review an agent's output — for example when it has human-eval fields (an agent-mode field paired with a human grade) or a review step. Those are the moments that surface as tasks for contributors and reviewers to pick up.

Each task creates node instances at every step it reaches, and each node instance has its own status.

Task states

Status	Description
Pending	Created but not yet available for work.
In Progress	At least one node instance is actively being worked on.
Paused	The pipeline was paused. The task resumes when the pipeline is reactivated.
Finished	All nodes completed successfully — the task has reached an End node.
Escalated	Flagged for admin attention due to review failures exceeding limits.
Quarantined	Held because the task exceeded the maximum number of re-reviews.
Failed	An unrecoverable error occurred during processing.

Node instance states

Status	Description
Pending	The task has not yet reached this node.
Available	Ready to be claimed by a contributor or processed by an LLM.
Claimed	A contributor has claimed this node, or an LLM is generating a response.
Submitted	Work has been submitted, awaiting the next step.
Finished	The node is complete.
Escalated	The node has been escalated to an admin.
Quarantined	The node is held due to re-review limits.
N/A	The node was skipped (e.g., a logic gate routed the task down a different path).
Deferred	The node is waiting for an upstream dependency.

Task flow

A typical task lifecycle:

Created — a task is created from a dataset row and enters the pipeline at the Start node, pairing the row with the agent under test.
Available — the first node becomes available. An agent run or LLM step is processed automatically; a human-eval or review step becomes available for a contributor to claim.
Claimed — a contributor claims the human-eval step from their work queue, or the agent/LLM begins executing.
Submitted — the contributor records their grade and submits, or the agent run / LLM generation finishes.
Reviewed (if applicable) — the task moves to a review node where a reviewer passes or fails the submission. A failed review sends the task back for rework.
Finished — the task reaches an End node and is marked as Finished.

Contributors and reviewers pick up the human-eval steps from their work queue — claiming a step to evaluate or review an agent's output — but most of the lifecycle runs on its own as the agent executes and the judge grades the trajectory.

Viewing tasks

Data Explorer

The primary interface for viewing and managing tasks is the Data Explorer, where you inspect agent traces in depth. It is accessible from the pipeline page; the header shows the task completion count and provides access to task creation, export, and creating derived columns. To drill into a single agent's trajectory and tool calls, see inspecting runs.

The Data Explorer has four tabs:

Task Metadata — task-level information: status, current node, assignee, timestamps, and per-node status columns.
Task Data — all form field values across nodes, including LLM responses and evaluations.
Evaluation Analytics — scorecard and per-field analytics for evaluation results (shown when the pipeline has evaluation fields).
LLM Analytics — LLM generation metrics (shown when the pipeline has LLM nodes).

Task detail

Click View on any task row to open a detail modal with two views:

Nodes view — per-node breakdown of form fields, submitted values, review data, and LLM output.
Timeline view — chronological history of all submissions, reviews, and state changes for the task.

Admin operations

Admins can perform the following operations on tasks:

View any task — open the full submission and review data.
Edit submissions — modify submitted field values.
Release — unclaim a task, returning it to the available queue.
Escalate/De-escalate — manually change escalation status.
Tag — apply tags to tasks for filtering and organization.

Bulk actions

Select multiple tasks in the Data Explorer to:

Export — download selected tasks as CSV, JSON, or ZIP.
Evaluate — run evaluations on selected tasks.
Delete — permanently remove selected tasks.

On this page