Coordination
Handling Failures
Building resilience into non-deterministic systems.
The Reliability Gap
LLMs are probabilistic. They fail. Tools fail. APIs rate-limit. If you don't handle these failures, your workflow success rate will be near zero.
Retry Policies
You can define retry policies at the Agent level or the Workflow level.
yaml
# In your Project manifest
policies:
- name: exponential-backoff
spec:
maxRetries: 3
backoff:
initial: 1s
multiplier: 2The Control Plane automatically wraps agent invocations. If an agent returns a 500 or a specific error code, Consonant sleeps and retries based on the policy. The Agent developer doesn't need to write retry loops.
Circuit Breakers
If `agent-search` starts failing 100% of requests (e.g. Google API is down), Consonant can "trip the breaker". This stops flooding the failing service and can trigger a fallback path in the Plan (e.g. "Use Bing Search agent instead").