Databricks Exam Databricks Certified Generative AI Engineer Associate Topic 4 Question 20 Discussion

Actual exam question for Databricks's Databricks Certified Generative AI Engineer Associate exam

Question #: 20
Topic #: 4

[All Databricks Certified Generative AI Engineer Associate Questions]

A Generative Al Engineer is developing a RAG system for their company to perform internal document Q&A for structured HR policies, but the answers returned are frequently incomplete and unstructured It seems that the retriever is not returning all relevant context The Generative Al Engineer has experimented with different embedding and response generating LLMs but that did not improve results.

Which TWO options could be used to improve the response quality?

Choose 2 answers

AAdd the section header as a prefix to chunks

BIncrease the document chunk size

CSplit the document by sentence

DUse a larger embedding model

EFine tune the response generation model

Show Suggested Answer

Suggested Answer: A, B

The problem describes a Retrieval-Augmented Generation (RAG) system for HR policy Q&A where responses are incomplete and unstructured due to the retriever failing to return sufficient context. The engineer has already tried different embedding and response-generating LLMs without success, suggesting the issue lies in the retrieval process---specifically, how documents are chunked and indexed. Let's evaluate the options.

Option A: Add the section header as a prefix to chunks

Adding section headers provides additional context to each chunk, helping the retriever understand the chunk's relevance within the document structure (e.g., ''Leave Policy: Annual Leave'' vs. just ''Annual Leave''). This can improve retrieval precision for structured HR policies.

Databricks Reference: 'Metadata, such as section headers, can be appended to chunks to enhance retrieval accuracy in RAG systems' ('Databricks Generative AI Cookbook,' 2023).

Option B: Increase the document chunk size

Larger chunks include more context per retrieval, reducing the chance of missing relevant information split across smaller chunks. For structured HR policies, this can ensure entire sections or rules are retrieved together.

Databricks Reference: 'Increasing chunk size can improve context completeness, though it may trade off with retrieval specificity' ('Building LLM Applications with Databricks').

Option C: Split the document by sentence

Splitting by sentence creates very small chunks, which could exacerbate the problem by fragmenting context further. This is likely why the current system fails---it retrieves incomplete snippets rather than cohesive policy sections.

Databricks Reference: No specific extract opposes this, but the emphasis on context completeness in RAG suggests smaller chunks worsen incomplete responses.

Option D: Use a larger embedding model

A larger embedding model might improve vector quality, but the question states that experimenting with different embedding models didn't help. This suggests the issue isn't embedding quality but rather chunking/retrieval strategy.

Databricks Reference: Embedding models are critical, but not the focus when retrieval context is the bottleneck.

Option E: Fine tune the response generation model

Fine-tuning the LLM could improve response coherence, but if the retriever doesn't provide complete context, the LLM can't generate full answers. The root issue is retrieval, not generation.

Databricks Reference: Fine-tuning is recommended for domain-specific generation, not retrieval fixes ('Generative AI Engineer Guide').

Conclusion: Options A and B address the retrieval issue directly by enhancing chunk context---either through metadata (A) or size (B)---aligning with Databricks' RAG optimization strategies. C would worsen the problem, while D and E don't target the root cause given prior experimentation.

by Kindra at Aug 17, 2025, 07:52 PM

Limited Time Offer

25%

2 months ago

I think adding the section header as a prefix to chunks could really help provide more context for the responses. It's like giving the AI a roadmap to follow.

upvoted 0 times

...