An ecommerce company is developing a generative AI application that uses Amazon Bedrock with Anthropic Claude to recommend products to customers. Customers report that some recommended products are not available for sale on the website or are not relevant to the customer. Customers also report that the solution takes a long time to generate some recommendations.
The company investigates the issues and finds that most interactions between customers and the product recommendation solution are unique. The company confirms that the solution recommends products that are not in the company's product catalog. The company must resolve these issues.
Which solution will meet this requirement?
Option C best addresses both core problems: hallucinated recommendations that do not exist in the catalog and slow response times, while keeping operational overhead low. The most direct way to prevent the model from recommending unavailable products is to ground generation on authoritative product catalog data at inference time. An Amazon Bedrock knowledge base is designed for this pattern by ingesting domain data, chunking content, creating embeddings, and retrieving the most relevant catalog entries when a user asks for recommendations. Implementing Retrieval Augmented Generation ensures the foundation model receives only approved, catalog-backed context and can cite or base its output on those retrieved items. This sharply reduces the likelihood of inventing products, because the response is conditioned on retrieved catalog records rather than relying on the model's parametric memory.
The requirement also notes that most interactions are unique. That makes response caching far less effective, because there are fewer repeated prompts to benefit from cached outputs. Instead, improving the retrieval and model invocation path is the better optimization. Using the PerformanceConfigLatency parameter set to optimized prioritizes lower latency behavior for model inference, helping meet faster recommendation generation without requiring the company to build and operate additional infrastructure.
The other options do not solve the root cause as reliably. Prompt engineering and streaming can improve perceived latency, but they do not guarantee catalog-only recommendations because the model can still hallucinate items. Guardrails can help detect or block certain undesired outputs, but without consistent catalog grounding they do not ensure every recommendation is derived from the company's product data. Building a custom OpenSearch validation and caching layer increases operational complexity, and caching is misaligned with predominantly unique interactions.
Currently there are no comments in this discussion, be the first to comment!