Pipelines Docs is in beta — content is actively being added.
Platform GuideTasks & Work

Task Lifecycle

Where human evaluation work on agent runs shows up, and how it flows through a pipeline.

A task is a single unit of work that moves through a pipeline. In the agent-testing model, each dataset row becomes one (agent, task) run: the agent executes, its tool calls route through the Odyssey proxy, and an LLM judge grades the trajectory. Tasks become visible work for people when a pipeline asks a human to evaluate or review an agent's output — for example when it has human-eval fields (an agent-mode field paired with a human grade) or a review step. Those are the moments that surface as tasks for contributors and reviewers to pick up.

Each task creates node instances at every step it reaches, and each node instance has its own status.

Task states

StatusDescription
PendingCreated but not yet available for work.
In ProgressAt least one node instance is actively being worked on.
PausedThe pipeline was paused. The task resumes when the pipeline is reactivated.
FinishedAll nodes completed successfully — the task has reached an End node.
EscalatedFlagged for admin attention due to review failures exceeding limits.
QuarantinedHeld because the task exceeded the maximum number of re-reviews.
FailedAn unrecoverable error occurred during processing.

Node instance states

StatusDescription
PendingThe task has not yet reached this node.
AvailableReady to be claimed by a contributor or processed by an LLM.
ClaimedA contributor has claimed this node, or an LLM is generating a response.
SubmittedWork has been submitted, awaiting the next step.
FinishedThe node is complete.
EscalatedThe node has been escalated to an admin.
QuarantinedThe node is held due to re-review limits.
N/AThe node was skipped (e.g., a logic gate routed the task down a different path).
DeferredThe node is waiting for an upstream dependency.

Task flow

A typical task lifecycle:

  1. Created — a task is created from a dataset row and enters the pipeline at the Start node, pairing the row with the agent under test.
  2. Available — the first node becomes available. An agent run or LLM step is processed automatically; a human-eval or review step becomes available for a contributor to claim.
  3. Claimed — a contributor claims the human-eval step from their work queue, or the agent/LLM begins executing.
  4. Submitted — the contributor records their grade and submits, or the agent run / LLM generation finishes.
  5. Reviewed (if applicable) — the task moves to a review node where a reviewer passes or fails the submission. A failed review sends the task back for rework.
  6. Finished — the task reaches an End node and is marked as Finished.

Contributors and reviewers pick up the human-eval steps from their work queue — claiming a step to evaluate or review an agent's output — but most of the lifecycle runs on its own as the agent executes and the judge grades the trajectory.

Viewing tasks

Data Explorer

The primary interface for viewing and managing tasks is the Data Explorer, where you inspect agent traces in depth. It is accessible from the pipeline page; the header shows the task completion count and provides access to task creation, export, and creating derived columns. To drill into a single agent's trajectory and tool calls, see inspecting runs.

The Data Explorer has four tabs:

  • Task Metadata — task-level information: status, current node, assignee, timestamps, and per-node status columns.
  • Task Data — all form field values across nodes, including LLM responses and evaluations.
  • Evaluation Analytics — scorecard and per-field analytics for evaluation results (shown when the pipeline has evaluation fields).
  • LLM Analytics — LLM generation metrics (shown when the pipeline has LLM nodes).

Task detail

Click View on any task row to open a detail modal with two views:

  • Nodes view — per-node breakdown of form fields, submitted values, review data, and LLM output.
  • Timeline view — chronological history of all submissions, reviews, and state changes for the task.

Admin operations

Admins can perform the following operations on tasks:

  • View any task — open the full submission and review data.
  • Edit submissions — modify submitted field values.
  • Release — unclaim a task, returning it to the available queue.
  • Escalate/De-escalate — manually change escalation status.
  • Tag — apply tags to tasks for filtering and organization.

Bulk actions

Select multiple tasks in the Data Explorer to:

  • Export — download selected tasks as CSV, JSON, or ZIP.
  • Evaluate — run evaluations on selected tasks.
  • Delete — permanently remove selected tasks.