Databricks Exam Databricks Certified Generative AI Engineer Associate Topic 1 Question 11 Discussion

Actual exam question for Databricks's Databricks Certified Generative AI Engineer Associate exam

Question #: 11
Topic #: 1

[All Databricks Certified Generative AI Engineer Associate Questions]

A company has a typical RAG-enabled, customer-facing chatbot on its website.

Select the correct sequence of components a user's questions will go through before the final output is returned. Use the diagram above for reference.

A1.embedding model, 2.vector search, 3.context-augmented prompt, 4.response-generating LLM

B1.context-augmented prompt, 2.vector search, 3.embedding model, 4.response-generating LLM

C1.response-generating LLM, 2.vector search, 3.context-augmented prompt, 4.embedding model

D1.response-generating LLM, 2.context-augmented prompt, 3.vector search, 4.embedding model

Show Suggested Answer

Suggested Answer: A

When deploying an LLM application for customer service inquiries, the primary focus is on measuring the operational efficiency and quality of the responses. Here's why A is the correct metric:

Number of customer inquiries processed per unit of time: This metric tracks the throughput of the customer service system, reflecting how many customer inquiries the LLM application can handle in a given time period (e.g., per minute or hour). High throughput is crucial in customer service applications where quick response times are essential to user satisfaction and business efficiency.

Real-time performance monitoring: Monitoring the number of queries processed is an important part of ensuring that the model is performing well under load, especially during peak traffic times. It also helps ensure the system scales properly to meet demand.

Why other options are not ideal:

B . Energy usage per query: While energy efficiency is a consideration, it is not the primary concern for a customer-facing application where user experience (i.e., fast and accurate responses) is critical.

C . Final perplexity scores for the training of the model: Perplexity is a metric for model training, but it doesn't reflect the real-time operational performance of an LLM in production.

D . HuggingFace Leaderboard values for the base LLM: The HuggingFace Leaderboard is more relevant during model selection and benchmarking. However, it is not a direct measure of the model's performance in a specific customer service application in production.

Focusing on throughput (inquiries processed per unit time) ensures that the LLM application is meeting business needs for fast and efficient customer service responses.

by Joesph at Feb 10, 2025, 09:50 PM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Generative AI Engineer Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

1 months ago

This is a tricky one, but I think I've got it. Option A is the way to go. It's like a well-choreographed dance, with each component playing its part to deliver the final response. Gotta love that efficient workflow!

upvoted 0 times

Elvera

15 days ago

I'm leaning towards Option A as well. The embedding model should kick things off.

upvoted 0 times

...

Lashaun

19 days ago

I think Option B might be the right choice. The context-augmented prompt should come first.

upvoted 0 times

...

Elden

21 days ago

I agree, Option A seems to be the correct sequence. The components work together seamlessly.

upvoted 0 times

...

2 months ago

The embedding model is definitely the first step to understand the user's question. Then, the vector search to find relevant information, followed by the context-augmented prompt to provide more context, and finally, the response-generating LLM to generate the output. Option A is the correct sequence.

upvoted 0 times