New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Data Engineer Professional Exam - Topic 6 Question 18 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 18
Topic #: 6
[All Databricks Certified Data Engineer Professional Questions]

The business intelligence team has a dashboard configured to track various summary metrics for retail stories. This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:

For Demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table named products_per_order, includes the following fields:

Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.

Which solution meets the expectations of the end users while controlling and limiting possible costs?

Show Suggested Answer Hide Answer
Suggested Answer: D

Given the requirement for daily refresh of data and the need to ensure quick response times for interactive queries while controlling costs, a nightly batch job to pre-compute and save the required summary metrics is the most suitable approach.

By pre-aggregating data during off-peak hours, the dashboard can serve queries quickly without requiring on-the-fly computation, which can be resource-intensive and slow, especially with many users.

This approach also limits the cost by avoiding continuous computation throughout the day and instead leverages a batch process that efficiently computes and stores the necessary data.

The other options (A, C, D) either do not address the cost and performance requirements effectively or are not suitable for the use case of less frequent data refresh and high interactivity.


Databricks Documentation on Batch Processing: Databricks Batch Processing

Data Lakehouse Patterns: Data Lakehouse Best Practices

Contribute your Thoughts:

0/2000 characters
Brendan
3 months ago
Not sure if nightly updates are enough for real-time needs.
upvoted 0 times
...
Francine
3 months ago
Definitely prefer the view option to keep costs down.
upvoted 0 times
...
Louann
3 months ago
Wait, can a live dashboard really handle that many queries?
upvoted 0 times
...
Vicki
4 months ago
I think using Delta Cache would speed things up even more.
upvoted 0 times
...
Martin
4 months ago
A nightly batch job sounds efficient for this!
upvoted 0 times
...
Daisy
4 months ago
Defining a view against the products_per_order table seems like a safe choice, but I’m not clear on how it would impact performance compared to the other options.
upvoted 0 times
...
Hortencia
4 months ago
Using Structured Streaming sounds appealing for real-time data, but I’m concerned about the costs associated with continuous processing.
upvoted 0 times
...
Larue
4 months ago
I think a nightly batch job could work well since the data only needs daily updates, but I wonder if it might slow down the dashboard during peak hours.
upvoted 0 times
...
Mignon
5 months ago
I remember we discussed the importance of quick query responses for dashboards, but I'm not sure if Delta Cache is the best option here.
upvoted 0 times
...
Ilda
5 months ago
Hmm, I think the key here is finding a balance between performance and cost. Defining a view against the products_per_order table could be a good approach - it would allow the dashboard to query the data quickly without the overhead of a full materialization. I'd go with option D.
upvoted 0 times
...
Nichelle
5 months ago
I'm a bit confused by the options. Using a nightly batch job to populate the dashboard seems like it could work, but I'm not sure if that would meet the requirement for quick response times. And I'm not familiar with using Structured Streaming for a dashboard, so I'm not sure about that one either.
upvoted 0 times
...
Fabiola
5 months ago
This looks like a classic data engineering problem. I'd start by understanding the requirements - the dashboard needs to be refreshed daily, but with quick response times for interactive queries. Hmm, let me think through the options...
upvoted 0 times
...
Lavonda
5 months ago
Okay, let's see. The question mentions the products_per_order table is updated in near real-time, so using that directly could be slow for the dashboard. Caching the data in memory with Delta Cache might work, but I'm not sure if that's the most cost-effective solution.
upvoted 0 times
...
Aretha
5 months ago
I think preparing an executive report, as in option D, is important for long-term solutions, but they need to address the immediate crisis first.
upvoted 0 times
...
Merrilee
5 months ago
This looks like a capital budgeting problem. I'll need to calculate the investment in fixed assets, working capital, and current liabilities to find the total investment.
upvoted 0 times
...
Giovanna
1 year ago
That's a valid point, but it also ensures real-time data availability for the users. It's a trade-off between speed and cost.
upvoted 0 times
...
Vilma
1 year ago
But won't live streaming consume more compute resources and increase costs?
upvoted 0 times
...
Giovanna
1 year ago
I disagree, I believe option C is better as it allows for live updates and interactive querying.
upvoted 0 times
...
Vilma
1 year ago
I think option A is the best choice because caching the table in memory will make the dashboard faster.
upvoted 0 times
...
Zona
1 year ago
Option A sounds tempting, but caching the entire table in memory might not be the most cost-effective solution. I'd probably go with option D as well.
upvoted 0 times
...
Kristel
1 year ago
Option C, huh? Looks like someone's been watching too many Databricks demos. Let's keep it simple, folks.
upvoted 0 times
Mollie
1 year ago
B) Populate the dashboard by configuring a nightly batch job to save the required to quickly update the dashboard with each query.
upvoted 0 times
...
Rex
1 year ago
A) Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.
upvoted 0 times
...
...
Milly
1 year ago
Hold on, a nightly batch job? That's so 2010s. What is this, the dark ages of data engineering?
upvoted 0 times
Noemi
1 year ago
Hold on, a nightly batch job? That's so 2010s. What is this, the dark ages of data engineering?
upvoted 0 times
...
Willodean
1 year ago
A) Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.
upvoted 0 times
...
...
Kenia
2 years ago
I think option D is the best solution. Defining a view against the products_per_order table and using that for the dashboard will provide the required data refresh frequency and reduce compute costs.
upvoted 0 times
Viki
1 year ago
I think using a view for the dashboard is a smart choice in this scenario.
upvoted 0 times
...
Lai
1 year ago
Yeah, defining a view against the table will definitely help with data refresh and cost control.
upvoted 0 times
...
Ona
1 year ago
I agree, option D seems like the most efficient solution.
upvoted 0 times
...
...

Save Cancel