A Generative Al Engineer is developing a RAG system for their company to perform internal document Q&A for structured HR policies, but the answers returned are frequently incomplete and unstructured It seems that the retriever is not returning all relevant context The Generative Al Engineer has experimented with different embedding and response generating LLMs but that did not improve results.
Which TWO options could be used to improve the response quality?
Choose 2 answers
The problem describes a Retrieval-Augmented Generation (RAG) system for HR policy Q&A where responses are incomplete and unstructured due to the retriever failing to return sufficient context. The engineer has already tried different embedding and response-generating LLMs without success, suggesting the issue lies in the retrieval process---specifically, how documents are chunked and indexed. Let's evaluate the options.
Option A: Add the section header as a prefix to chunks
Adding section headers provides additional context to each chunk, helping the retriever understand the chunk's relevance within the document structure (e.g., ''Leave Policy: Annual Leave'' vs. just ''Annual Leave''). This can improve retrieval precision for structured HR policies.
Databricks Reference: 'Metadata, such as section headers, can be appended to chunks to enhance retrieval accuracy in RAG systems' ('Databricks Generative AI Cookbook,' 2023).
Option B: Increase the document chunk size
Larger chunks include more context per retrieval, reducing the chance of missing relevant information split across smaller chunks. For structured HR policies, this can ensure entire sections or rules are retrieved together.
Databricks Reference: 'Increasing chunk size can improve context completeness, though it may trade off with retrieval specificity' ('Building LLM Applications with Databricks').
Option C: Split the document by sentence
Splitting by sentence creates very small chunks, which could exacerbate the problem by fragmenting context further. This is likely why the current system fails---it retrieves incomplete snippets rather than cohesive policy sections.
Databricks Reference: No specific extract opposes this, but the emphasis on context completeness in RAG suggests smaller chunks worsen incomplete responses.
Option D: Use a larger embedding model
A larger embedding model might improve vector quality, but the question states that experimenting with different embedding models didn't help. This suggests the issue isn't embedding quality but rather chunking/retrieval strategy.
Databricks Reference: Embedding models are critical, but not the focus when retrieval context is the bottleneck.
Option E: Fine tune the response generation model
Fine-tuning the LLM could improve response coherence, but if the retriever doesn't provide complete context, the LLM can't generate full answers. The root issue is retrieval, not generation.
Databricks Reference: Fine-tuning is recommended for domain-specific generation, not retrieval fixes ('Generative AI Engineer Guide').
Conclusion: Options A and B address the retrieval issue directly by enhancing chunk context---either through metadata (A) or size (B)---aligning with Databricks' RAG optimization strategies. C would worsen the problem, while D and E don't target the root cause given prior experimentation.
Kindra
2 days agoKina
8 days agoMitsue
13 days agoStephaine
18 days agoYuriko
23 days agoIvette
28 days agoAlverta
1 month agoAmos
1 month agoElli
2 months ago