Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Data Engineer Professional Topic 1 Question 2 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 2
Topic #: 1
[All Databricks Certified Data Engineer Professional Questions]

A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impression led to monitizable clicks.

Which solution would improve the performance?

A)

B)

C)

D)

Show Suggested Answer Hide Answer
Suggested Answer: A

When joining a stream of advertisement impressions with a stream of user clicks, you want to minimize the state that you need to maintain for the join. Option A suggests using a left outer join with the condition that clickTime == impressionTime, which is suitable for correlating events that occur at the exact same time. However, in a real-world scenario, you would likely need some leeway to account for the delay between an impression and a possible click. It's important to design the join condition and the window of time considered to optimize performance while still capturing the relevant user interactions. In this case, having the watermark can help with state management and avoid state growing unbounded by discarding old state data that's unlikely to match with new data.


Contribute your Thoughts:

Brittni
1 years ago
I agree with Nana, Option B looks like the best choice for improving performance.
upvoted 0 times
...
Nana
1 years ago
Option B seems to have a more efficient way of joining the streams based on the image provided.
upvoted 0 times
...
Werner
1 years ago
Why do you think Option B is better?
upvoted 0 times
...
Nana
1 years ago
I disagree, I believe Option B would be more effective.
upvoted 0 times
...
Werner
1 years ago
I think the solution to improve performance is Option A.
upvoted 0 times
...
Juliana
1 years ago
I think option D is the way to go, it seems to offer a more scalable solution for correlating ad impressions with clicks.
upvoted 0 times
...
Werner
1 years ago
I'm leaning towards option C because it looks like it could potentially enhance the performance of joining the streams.
upvoted 0 times
...
Rebeca
1 years ago
I disagree, I believe option B is the better choice as it might offer a more optimized solution for correlating impressions with clicks.
upvoted 0 times
...
Quiana
1 years ago
I think the answer is option A because it seems to provide a more efficient way to join the two streams.
upvoted 0 times
...

Save Cancel