You are building a customer-support chatbot that fetches user account data from an external billing API. During testing, the API sometimes returns timeouts or 500 errors. You want the agent to be resilient-retrying when appropriate but failing gracefully if the service is down.
Which strategy best handles intermittent failures in API calls while still ensuring a good user experience?
The selected design maps to Implement exponential-backoff retries with a circuit breaker and return a clear message to the user if all retries..., which is the highest-control path for this scenario rather than a prompt-only or single-service shortcut. For tool-using agents, the durable pattern is schema-bound function invocation with timeouts, typed outputs, retry policy, and traceable execution rather than free-form endpoint guessing. The evaluation target is the full agent workflow: planning quality, tool selection, intermediate state, latency, retries, user feedback, and final task completion. Instrumentation must expose where degradation starts so remediation can focus on prompts, tool schemas, retrieval, model parameters, or infrastructure rather than random retuning. The distractors are weaker because they lean on A: Retry requests with a consistent short delay after each failure and notify...; C: Return a standard fallback message on failures to maintain conversation flow and...; D: Schedule retries using a fixed delay for all failure types maintaining predictable..., which compromises traceability, resilience, scalability, or policy enforcement in production. The answer therefore fits NVIDIA's production-agent pattern: modular workflow design, measurable runtime behavior, GPU-aware serving where applicable, and controlled integration with enterprise systems.
Yolande
12 days ago