Pipelines Docs is in beta — content is actively being added.
AgentsInteraction Patterns

Multi-turn testing setup

Configure model-as-user conversations with task seed axes, simulator modes, and session-level inspection.

Multi-turn testing evaluates an agent as a conversation rather than a single request and response cycle. Each task row can create an agent session with ordered turns, shared world state, and a session-level judge verdict.

Use this mode when testing memory, instruction consistency across turns, or policy regressions that emerge only after follow-up prompts.

Prerequisites

Before enabling multi-turn behavior on dataset rows:

Enable multi-turn by row

In Pipeline Builder, on the agent field:

  1. Open Odyssey Seed Columns.
  2. Map a dataset column to turn_mode.
  3. Set row values to model_as_user for rows that should run as multi-turn sessions. Leave blank, or set single_shot, for one-shot rows.

This allows one-shot and multi-turn scenarios to coexist in the same dataset.

Multi-turn seed axes

All multi-turn controls are optional. If omitted, defaults are applied.

AxisAccepted values / shapeDefaultNotes
turn_modesingle_shot or model_as_usersingle_shotmodel_as_user enables multi-turn session orchestration.
max_turnsInteger 1..5010Hard cap on conversation length.
memory_modereplay or statefulreplayreplay resends transcript each turn. stateful relies on agent-side memory.
simulator_modepersona or scriptedpersonapersona generates next user turn with an LLM. scripted replays fixed turns.
user_simulator_personaTextnoneUsed by persona mode to shape user behavior.
scripted_user_turnsJSON array of strings[]Used by scripted mode. Each string represents one user turn.
tracked_constraintsJSON array[]Session-level constraints checked in judge output, including fact-checking matrix.
termination_keywordText substringnoneSession ends early when this substring appears in an agent reply.

Recommended initial configuration:

  • turn_mode = model_as_user
  • simulator_mode = persona
  • memory_mode = replay
  • max_turns = 6-10
  • A short user_simulator_persona describing goals, tone, and escalation style

After baseline quality stabilizes, add stateful rows to catch memory regressions inside your own runtime/framework memory layer.

Dispatch envelope in multi-turn

Each turn is dispatched as a new agent run with its own per-run token. The current user prompt is always delivered in input.user_instruction, matching single-shot dispatch behavior. It is also mirrored in input.latest_user_prompt as a backward compatibility alias.

In replay mode, the platform also sends prior conversation history in input.messages and carried world state in input.scenario_state. In stateful mode, those replay fields are omitted. input.session_id and input.turn_id remain available as optional context for agents that key memory explicitly.

CSV examples

Persona-driven conversations

user,turn_mode,max_turns,simulator_mode,memory_mode,user_simulator_persona,tracked_constraints
"Help me fix my failed refund for order #4521.","model_as_user","8","persona","replay","Customer is frustrated but cooperative. They expect the agent to remember previous order details and avoid repeating verification steps.","[""Never reveal internal refund policy notes."",""Always confirm order id before refunding.""]"

Scripted deterministic replay

user,turn_mode,simulator_mode,scripted_user_turns,max_turns,memory_mode,termination_keyword
"Resolve this support issue end-to-end.","model_as_user","scripted","[""My name is Alex and order is #9001."",""Can you repeat my name and order number?"",""Please process a refund.""]","6","stateful","Your refund has been processed"

How sessions terminate

A multi-turn session ends when any of the following conditions is met:

  • The simulator emits a terminate signal.
  • max_turns is reached.
  • termination_keyword appears in an agent reply.
  • A turn fails or times out under end-session policy.

After termination, the platform runs a session-level judge on the full transcript and writes verdict and metrics to the session.

Multi-turn with a coding workspace

When task seed targets a coding scenario with seeded repository workspace, the session adds coding workspace behavior on top of multi-turn orchestration. Core session mechanics remain unchanged.

  • One workspace reused across turns. Sandbox and seeded repository are created once at turn 0, reused on subsequent turns, and torn down at session end.
  • Cumulative diff per turn. Each turn diff is measured against the original seeded baseline, not prior-turn state.
  • Session judge receives workspace evidence. For coding sessions, the session-level judge receives cumulative workspace diff plus scorer and objective outputs alongside transcript.

The following do not change: conversation configuration fields, session termination behavior, and session-level judge rubric. Multi-turn and coding are independent axes. Any valid combination is supported. Non-coding sessions skip workspace-specific behavior.

Inspecting results

From Data Explorer, open a task row and inspect Agent Trace for that task. Multi-turn rows expose:

  • Canonical transcript across turns.
  • Turn-by-turn trajectory, including tool calls and trace events.
  • Session-level judge verdict and metrics.
  • Constraint outcomes from tracked_constraints.

For the baseline trace UI model, see Inspecting runs.

Common setup mistakes

  • turn_mode typo, such as model_as_users or multi_turn, causes silent fallback to one-shot defaults.
  • max_turns outside 1..50 is dropped and default value is applied.
  • scripted_user_turns that is not a JSON array of strings is ignored.
  • simulator_mode set to scripted with an empty script can terminate session immediately.
  • stateful mode without robust agent-side session memory can appear as context regression.