Introduction

Polos is a durable execution platform for AI agents. It provides the stateful infrastructure required to run long-running, autonomous agents reliably at scale. Write agents and workflows in plain Python (or TypeScript - coming soon) with standard programming constructs. No DAGs to define, no graph syntax to learn - just write Python. Use loops, conditionals, and function calls naturally while Polos handles durability, failure recovery, and scaling automatically.

from polos import Agent, workflow, WorkflowContext

order_validation_agent = Agent(
    provider="openai", 
    model="gpt-4o",
    tools=[check_inventory, calculate_shipping]
)

@workflow
async def process_order(ctx: WorkflowContext, order: ProcessOrderInput):
    # Agent validates order and checks inventory
    validation = await ctx.step.agent_invoke_and_wait(
        "validate_order",
        order_validation_agent.with_input(f"Validate this order: {order}")
    )
    
    if not validation.result.valid:
        return ProcessOrderOutput(
            status="invalid",
            reason=validation.result.reason
        )
    
    # High-value orders need approval
    if order.amount > 1000:
        # Suspend execution until the order is approved or rejected
        decision = await ctx.step.suspend(
            "approval",
            data={
                "id": order.id,
                "amount": order.amount,
                "items": order.items,
                "user", order.user
            }
        )
        if not decision.data["approved"]:
            return ProcessOrderOutput(
                status="rejected",
                reason=decision.data.get("reason")
            )
    
    # Charge customer (exactly-once guarantee)
    payment = await ctx.step.run("charge", charge_stripe, order)
    
    # Wait for warehouse pickup (could be hours or days)
    await ctx.step.wait_for_event(
        "wait_pickup",
        topic=f"warehouse.pickup/{order.id}"
    )
    
    # Send shipping notification
    await ctx.step.run("notify", send_shipping_email, order)
    
    return ProcessOrderOutput(status="completed", payment_id=payment.id)

This workflow survives crashes, resumes mid-execution, and pauses for approval - all with zero manual checkpointing, retry logic or queue management.

The Problem

Most AI agents work in demos but break in production. They’re long-running distributed systems, yet we run them on infrastructure built for stateless APIs. What breaks:

Server restarts lose all progress
Failed API calls restart from scratch, wasting tokens
Difficult to pause for human approval
Multi-agent systems can’t share context reliably
One workflow can exhaust your entire OpenAI quota

Write Code, Not Configs

With Polos:

@workflow
async def process_order(ctx: WorkflowContext, order: ProcessOrderInput):
    # Just write Python
    if order.amount > 1000:
        approved = await ctx.step.suspend("approval", data=order.model_dump())
        if not approved.data["ok"]:
            return {"status": "rejected"}
    
    await ctx.step.run("charge", charge_stripe, order)
    await ctx.step.run("notify", send_email, order)

Other platforms:

# Define rigid DAGs upfront
dag = DAG(
    nodes=[
        Node("check_amount", CheckAmount),
        Node("approval", HumanApproval),
        Node("charge", ChargeStripe),
        Node("notify", SendEmail),
    ],
    edges=[
        ("check_amount", "approval", condition="amount > 1000"),
        ("check_amount", "charge", condition="amount <= 1000"),
        ("approval", "charge", condition="approved"),
        ("charge", "notify"),
    ]
)

With Polos, there are no DAGs to define, no graph syntax to learn. Use loops, conditionals, and function calls naturally while Polos handles durability automatically.

Why Polos?

🧠 Durable State: Your agent survives crashes with call stack and local variables intact. Step 18 of 20 fails? Resume from step 18. No wasted LLM calls, no manual checkpointing or state-machine hacks required. 🚦 Global Concurrency: System-wide rate limiting with queues and concurrency keys. Prevent one rogue agent from exhausting your entire OpenAI quota. Only active executions count toward limits - queued runs wait their turn without consuming resources. 🤝 Human-in-the-Loop: Native support for pausing execution. Wait hours or days for user approval and resume with full context. In serverless environments, paused agents consume zero compute - you only pay when they’re actively running. 📡 Agent Handoffs: Transactional memory for multi-agent systems. Pass reasoning history between specialized agents without context drift. Shared working memory enables true agent collaboration. 🔍 Decision-Level Observability: Trace the reasoning behind every tool call, not just raw logs. See why your agent chose Tool B over Tool A. Debug deterministic failures in stochastic systems. ⚡ Production Ready: Automatic retries, exactly-once execution guarantees, OpenTelemetry tracing built-in. Scales to millions of concurrent workflows.

What You Can Build?

🔬 Research assistants: Multi-hour workflows that search, analyze, and synthesize information across dozens of sources 💰 Financial operations: Approval workflows that pause for human review, then execute Stripe charges with exactly-once guarantees 🤖 Multi-agent systems: Specialized agents (researcher, writer, editor) that coordinate via shared memory to complete complex tasks ⚙️ Background automation: Long-running jobs that survive deploys and resume seamlessly (data migrations, batch processing, ETL pipelines)

Next Steps

Quickstart

Deploy your first durable agent in 5 minutes

Core Concepts

Learn how Polos handles state and durability

Examples

See how to build HITL agents and multi-agent systems

Introduction

Quickstart

Fundamentals

Agents

Workflows

Observability

Guides and examples

Community

Welcome to Polos

Introduction

The Problem

Write Code, Not Configs

Why Polos?

What You Can Build?

Next Steps

Quickstart

Core Concepts

Examples

Introduction

Quickstart

Fundamentals

Agents

Workflows

Observability

Guides and examples

Community

​Introduction

​The Problem

​Write Code, Not Configs

​Why Polos?

​What You Can Build?

​Next Steps

Quickstart

Core Concepts

Examples

Introduction

The Problem

Write Code, Not Configs

Why Polos?

What You Can Build?

Next Steps