Pipelines Docs is in beta — content is actively being added.

Pipelines Documentation

Pre-deployment simulation for AI agents

What is Pipelines?

Pipelines provides the infrastructure to test how AI agents behave in production-adjacent environments — from the decisions they make and the quality of their outputs to their operational performance, safety, and adherence to your standards. Port any agent, build a simulation, and scientifically measure what matters before you deploy.

Generate realistic scenarios on demand with our synthetic data generation service — seeded with your own data or built from scratch — tailored to your standard operating procedures, tribal knowledge, or curiosity. Then connect your agent, track its cost and performance, and ship a system you've tested and verified.

Quickstart

Test your system

How It Works

  1. Register your agent: Point us at an HTTP endpoint you host, or bring your own code to run in a sandbox — by pasting it, linking a git repo, or uploading an archive. Then declare the tools your agent can use.
  2. Generate scenarios: Use synthetic data generation to create test simulations in bulk, each with its own setup, expected behavior, and pass criteria.
  3. Run: Submit your scenarios to Odyssey, which runs each one as an isolated, live simulation — capturing every tool call and trace event along the way.
  4. Grade: Score each run with an LLM judge, your own custom graders, and operational metrics — getting a clear verdict with the reasoning behind it.
  5. Compare: Every experiment is stored and versioned, so you can review trajectories and track quality across agent versions.