Documentation
Docs
Coordination

Handling Failures

Building resilience into non-deterministic systems.

The Reliability Gap

LLMs are probabilistic. They fail. Tools fail. APIs rate-limit. If you don't handle these failures, your workflow success rate will be near zero.

Retry Policies

You can define retry policies at the Agent level or the Workflow level.

yaml
# In your Project manifest
policies:
  - name: exponential-backoff
    spec:
       maxRetries: 3
       backoff: 
         initial: 1s
         multiplier: 2

The Control Plane automatically wraps agent invocations. If an agent returns a 500 or a specific error code, Consonant sleeps and retries based on the policy. The Agent developer doesn't need to write retry loops.

Circuit Breakers

If `agent-search` starts failing 100% of requests (e.g. Google API is down), Consonant can "trip the breaker". This stops flooding the failing service and can trigger a fallback path in the Plan (e.g. "Use Bing Search agent instead").