An ecommerce company is developing a generative AI application that uses Amazon Bedrock with Anthropic Claude to recommend products to customers. Customers report that some recommended products are not available for sale on the website or are not relevant to the customer. Customers also report that the solution takes a long time to generate some recommendations.
The company investigates the issues and finds that most interactions between customers and the product recommendation solution are unique. The company confirms that the solution recommends products that are not in the company's product catalog. The company must resolve these issues.
Which solution will meet this requirement?
Option C best addresses both core problems: hallucinated recommendations that do not exist in the catalog and slow response times, while keeping operational overhead low. The most direct way to prevent the model from recommending unavailable products is to ground generation on authoritative product catalog data at inference time. An Amazon Bedrock knowledge base is designed for this pattern by ingesting domain data, chunking content, creating embeddings, and retrieving the most relevant catalog entries when a user asks for recommendations. Implementing Retrieval Augmented Generation ensures the foundation model receives only approved, catalog-backed context and can cite or base its output on those retrieved items. This sharply reduces the likelihood of inventing products, because the response is conditioned on retrieved catalog records rather than relying on the model's parametric memory.
The requirement also notes that most interactions are unique. That makes response caching far less effective, because there are fewer repeated prompts to benefit from cached outputs. Instead, improving the retrieval and model invocation path is the better optimization. Using the PerformanceConfigLatency parameter set to optimized prioritizes lower latency behavior for model inference, helping meet faster recommendation generation without requiring the company to build and operate additional infrastructure.
The other options do not solve the root cause as reliably. Prompt engineering and streaming can improve perceived latency, but they do not guarantee catalog-only recommendations because the model can still hallucinate items. Guardrails can help detect or block certain undesired outputs, but without consistent catalog grounding they do not ensure every recommendation is derived from the company's product data. Building a custom OpenSearch validation and caching layer increases operational complexity, and caching is misaligned with predominantly unique interactions.
Example Corp provides a personalized video generation service that millions of enterprise customers use. Customers generate marketing videos by submitting prompts to the company's proprietary generative AI (GenAI) model. To improve output relevance and personalization, Example Corp wants to enhance the prompts by using customer-specific context such as product preferences, customer attributes, and business history.
The customers have strict data governance requirements. The customers must retain full ownership and control over their own data. The customers do not require real-time access. However, semantic accuracy must be high and retrieval latency must remain low to support customer experience use cases.
Example Corp wants to minimize architectural complexity in its integration pattern. Example Corp does not want to deploy and manage services in each customer's environment unless necessary.
Which solution will meet these requirements?
Option A is the correct solution because Amazon Q Business is explicitly designed to provide secure, governed access to enterprise data while preserving customer ownership and control. Each customer maintains their own Amazon Q Business index, which ensures that data never leaves the customer's control boundary unless explicitly shared through approved access mechanisms.
By designating Example Corp as a data accessor, customers can allow controlled, auditable access to their indexed content through secure APIs. This model satisfies strict data governance requirements, including data ownership, access transparency, and revocation capability. Customers do not need to expose raw data or deploy infrastructure in Example Corp's environment.
Amazon Q Business provides high semantic accuracy through managed indexing, ranking, and retrieval optimizations. Because real-time access is not required, this approach avoids the complexity and latency challenges of live federated retrieval while still delivering fast query performance suitable for customer experience use cases.
Option B introduces unnecessary operational complexity by requiring real-time MCP servers per customer. Option C requires customers to manage Amazon Bedrock knowledge bases and enable cross-account access, which increases integration complexity and governance risk. Option D requires shared Amazon Kendra indexes across accounts, which complicates access control and data ownership boundaries.
Therefore, Option A provides the cleanest, lowest-overhead architecture that meets data governance, accuracy, performance, and scalability requirements while minimizing operational burden for both Example Corp and its customers.
A company is building a generative AI (GenAI) application that uses Amazon Bedrock APIs to process complex customer inquiries. During peak usage periods, the application experiences intermittent API timeouts that cause issues such as broken response chunks and delayed data delivery. The application struggles to ensure that prompts remain within token limits when handling complex customer inquiries of varying lengths. Users have reported truncated inputs and incomplete responses. The company has also observed foundation model (FM) invocation failures.
The company needs a retry strategy that automatically handles transient service errors and prevents overwhelming Amazon Bedrock during peak usage periods. The strategy must also adapt to changing service availability and support response streaming and token-aware request handling.
Which solution will meet these requirements?
Option B best meets all requirements because it combines AWS-recommended resiliency patterns for transient failures with streaming-aware handling and adaptive protection against cascading retries during peak load. When timeouts and throttling occur, nave retries can amplify traffic and worsen outages. Exponential backoff with jitter is the standard AWS best practice because it spreads retry attempts over time, reduces synchronized retry storms, and lowers the probability of repeatedly colliding with service limits.
The requirement also states the strategy must ''adapt to changing service availability'' and ''prevent overwhelming Amazon Bedrock.'' A circuit breaker pattern directly addresses this by temporarily stopping or reducing retries when failure rates exceed a threshold, allowing the system to degrade gracefully instead of continually hammering the service. This is a key mechanism to prevent cascading failures during throttling events.
Because the application uses response streaming and experiences broken chunks, the retry strategy must be streaming-aware. A streaming response handler that detects chunk delivery timeouts and buffers already received chunks prevents the user from losing progress when a connection drops. Resuming from the last successfully received chunk minimizes redundant generation and reduces additional load on the model compared with restarting the entire stream. This supports better user experience and better service efficiency during intermittent failures.
Token-aware request handling is supported in this architecture because the application can apply token budgeting before invoking the model (for example, trimming or summarizing excessive context) while still preserving streaming output behavior. Option B provides the correct backbone for this by focusing on adaptive control and robust streaming recovery.
Option A is too simplistic and risks retry storms. Option C combines conflicting elements (global token limit, cached completions for streaming) and includes impractical ''request only missing chunks'' behavior that is not a reliable property of streamed generative output. Option D includes useful ideas (load shedding) but relies on static caps and does not provide as strong adaptive retry control as circuit breaking.
Therefore, Option B is the most correct and operationally safe strategy for peak-load Bedrock streaming workloads.
A financial services company wants to develop an Amazon Bedrock application that gives analysts the ability to query quarterly earnings reports and financial statements. The financial documents are typically 5--100 pages long and contain both tabular data and text. The application must provide contextually accurate responses that preserve the relationship between financial metrics and their explanatory text. To support accurate and scalable retrieval, the application must incorporate document segmentation and context management strategies.
Which solution will meet these requirements?
Option B best satisfies the requirements because it directly applies Retrieval Augmented Generation principles using managed Amazon Bedrock Knowledge Bases, which are designed to handle large, complex documents while preserving contextual relationships. Financial reports often interleave tables with explanatory narrative, and accurate analysis depends on keeping those elements logically connected. By segmenting documents based on their structural layout---for example, sections, subsections, tables, and surrounding commentary---the knowledge base can retrieve semantically relevant chunks that maintain this relationship during inference.
Amazon Bedrock Knowledge Bases support contextual chunking strategies that go beyond simple fixed-size segmentation. This is critical for financial documents, where a metric in a table may be explained in adjacent paragraphs or footnotes. Context-aware chunking ensures that retrieved content includes both the numeric data and its interpretation, enabling the foundation model to generate accurate, grounded responses. Including citations further improves analyst trust and auditability by allowing users to trace answers back to specific source sections, which is a common requirement in financial environments.
Scalability is another key requirement. Knowledge Bases manage embedding generation, indexing, and retrieval orchestration as a managed service, which allows the solution to scale across large document collections without requiring custom infrastructure or model hosting. This approach also supports efficient updates as new quarterly reports are added, ensuring the retrieval layer remains current.
Option A does not scale well because processing entire 5--100 page documents in a single prompt increases token usage, latency, and cost while risking context truncation. Option C relies on fixed-size chunking triggered at query time, which often breaks semantic relationships in structured financial content. Option D introduces unnecessary architectural complexity by splitting structured and unstructured data into separate applications, increasing operational overhead without providing better contextual retrieval than a unified RAG approach.
An insurance company uses existing Amazon SageMaker AI infrastructure to support a web-based application that allows customers to predict what their insurance premiums will be. The company stores customer data that is used to train the SageMaker AI model in an Amazon S3 bucket. The dataset is growing rapidly. The company wants a solution to continuously re-train the model. The solution must automatically re-train and re-deploy the model to the application when an employee uploads a new customer data file to the S3 bucket.
Which solution will meet these requirements?
Option D is the best fit because it implements a reliable event-driven MLOps workflow that automates retraining and redeployment with clear orchestration, auditability, and production-grade error handling. The requirement is explicit: whenever a new file is uploaded to Amazon S3, the system must retrain and then redeploy the model used by a web application. A common AWS pattern is to use an S3 event notification to trigger an AWS Lambda function, which then starts a controlled workflow. In option D, Lambda serves as the event handler that reacts immediately to the S3 upload event and passes the necessary context (bucket, object key, dataset version) into an AWS Step Functions Standard state machine.
Step Functions Standard is appropriate for model retraining pipelines because training and deployment steps can be long-running and benefit from durable state, retries, and failure handling. It provides execution history, making it easier to troubleshoot why a particular retraining run failed and to prove which dataset version produced which model version. This operational visibility is critical when the dataset is ''growing rapidly'' and retraining is frequent.
Within the workflow, Amazon SageMaker Pipelines is the right service to run the ML lifecycle stages in a repeatable way: data processing (if needed), training, evaluation/quality checks, model registration, and deployment to an endpoint used by the application. SageMaker Pipelines is purpose-built for CI/CD-style ML, supporting automated redeployments when a new approved model artifact is produced. By calling a pipeline execution from Step Functions, the company can add governance gates (for example, only deploy if evaluation metrics meet thresholds), and can apply consistent rollback and notification steps when deployment fails.
The other options are weaker: A confuses inference with retraining and does not provide deployment orchestration. B adds unnecessary webhook complexity and describes an awkward event bus configuration. C introduces Autopilot/Data Wrangler, which may be useful but adds extra moving parts and is not required to meet the trigger-and-redeploy requirement.
David King
16 days agoHarold Perez
16 days agoPaul Taylor
16 days agoChristopher Ramirez
16 days agoDonna Moore
16 days agoAngela Young
17 days agoKaren Nelson
27 days agoMonica Stewart
27 days agoPatricia Taylor
1 month agoPaul Smith
1 month agoAmy Gonzalez
1 month agoOlivia Rodriguez
28 days agoMonica Campbell
24 days agoSandra Scott
1 month agoHarold Clark
22 days agoTy
2 months agoLaurel
2 months agoJeanice
3 months agoSabra
3 months agoTy
3 months agoFredric
3 months agoLucy
4 months agoAdria
4 months agoFrancisca
4 months agoTashia
4 months ago