Databricks Exam Databricks-Generative-AI-Engineer-Associate Topic 4 Question 9 Discussion

Actual exam question for Databricks's Databricks-Generative-AI-Engineer-Associate exam

Question #: 9
Topic #: 4

[All Databricks-Generative-AI-Engineer-Associate Questions]

A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.

Which will fulfill their need?

Acontext length 514; smallest model is 0.44GB and embedding dimension 768

Bcontext length 2048: smallest model is 11GB and embedding dimension 2560

Ccontext length 32768: smallest model is 14GB and embedding dimension 4096

Dcontext length 512: smallest model is 0.13GB and embedding dimension 384

Show Suggested Answer

Suggested Answer: A

When deploying an LLM application for customer service inquiries, the primary focus is on measuring the operational efficiency and quality of the responses. Here's why A is the correct metric:

Number of customer inquiries processed per unit of time: This metric tracks the throughput of the customer service system, reflecting how many customer inquiries the LLM application can handle in a given time period (e.g., per minute or hour). High throughput is crucial in customer service applications where quick response times are essential to user satisfaction and business efficiency.

Real-time performance monitoring: Monitoring the number of queries processed is an important part of ensuring that the model is performing well under load, especially during peak traffic times. It also helps ensure the system scales properly to meet demand.

Why other options are not ideal:

B . Energy usage per query: While energy efficiency is a consideration, it is not the primary concern for a customer-facing application where user experience (i.e., fast and accurate responses) is critical.

C . Final perplexity scores for the training of the model: Perplexity is a metric for model training, but it doesn't reflect the real-time operational performance of an LLM in production.

D . HuggingFace Leaderboard values for the base LLM: The HuggingFace Leaderboard is more relevant during model selection and benchmarking. However, it is not a direct measure of the model's performance in a specific customer service application in production.

Focusing on throughput (inquiries processed per unit time) ensures that the LLM application is meeting business needs for fast and efficient customer service responses.

by Ranee at Jan 12, 2025, 01:50 AM

Limited Time Offer

25%

Off

Get Premium Databricks-Generative-AI-Engineer-Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Hortencia

27 days ago

Haha, I love how the options just keep getting more and more absurd. 14GB for a model? They must be running this on a supercomputer!

upvoted 0 times

...

1 months ago

Hmm, 514 tokens might work, but that extra cost and size is probably not worth it. I'd go with the 384 embedding dimension option.

upvoted 0 times

Shawna

1 days ago

I think the 384 embedding dimension with 512 tokens is the way to go for this application.

upvoted 0 times

...

Cassie

4 days ago

Yeah, the extra cost and size for the 514 tokens might not be worth it.

upvoted 0 times

...

Jeff

27 days ago

I agree, the 384 embedding dimension option seems like the best choice.

upvoted 0 times

...

2 months ago

I think option D with context length 512 would be the best choice.

upvoted 0 times

...