Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 6 Question 15 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 15
Topic #: 6
[All Databricks-Certified-Professional-Data-Engineer Questions]

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

Show Suggested Answer Hide Answer
Suggested Answer: C

In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.


Contribute your Thoughts:

Anissa
10 months ago
I'm no data engineer, but I hear caching is like trying to remember where you left your car keys. If the Spark UI is confused, you know you've got a problem!
upvoted 0 times
Phillip
10 months ago
B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Jettie
10 months ago
A) Size on Disk is> 0
upvoted 0 times
...
...
Raul
10 months ago
I feel like the answer is B. Anything that doesn't match the actual Spark partitions is probably not a good sign.
upvoted 0 times
Lizbeth
10 months ago
Yeah, it's important to keep an eye on those indicators in the Spark UI's Storage tab.
upvoted 0 times
...
Jerry
10 months ago
I agree, that would definitely be a sign that the cached table is not performing optimally.
upvoted 0 times
...
Frederica
10 months ago
I think the answer is B. If the number of Cached Partitions is greater than the number of Spark Partitions, it's not good.
upvoted 0 times
...
...
Lea
11 months ago
I think C) The RDD Block Name included the '' annotation signaling failure to cache is also a valid indicator of poor performance.
upvoted 0 times
...
Enola
11 months ago
But if the number of Cached Partitions is greater than the number of Spark Partitions, wouldn't that indicate a performance issue?
upvoted 0 times
...
Lemuel
11 months ago
I disagree, I believe the correct answer is A) Size on Disk is> 0.
upvoted 0 times
...
Georgeanna
11 months ago
This is a tough one, but I'm going to go with C. The '' annotation in the RDD Block Name is a clear indication of caching failure.
upvoted 0 times
Valentin
9 months ago
I agree with User1, C) The RDD Block Name included the '' annotation signaling failure to cache seems like the right choice
upvoted 0 times
...
Lachelle
9 months ago
I'm leaning towards B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Jenelle
10 months ago
I think it's A) Size on Disk is> 0
upvoted 0 times
...
Latanya
10 months ago
I'm not sure, but I think it might be B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Helaine
10 months ago
I disagree, I believe it's C) The RDD Block Name included the '' annotation signaling failure to cache
upvoted 0 times
...
Nell
11 months ago
I think it's A) Size on Disk is> 0
upvoted 0 times
...
...
Thaddeus
11 months ago
Hmm, I'm going with B. If the number of cached partitions is greater than the actual Spark partitions, that's a red flag.
upvoted 0 times
Yolando
10 months ago
Yeah, if the number of cached partitions is more than the Spark partitions, it's not performing optimally.
upvoted 0 times
...
Darci
10 months ago
I think B is the right indicator to look for.
upvoted 0 times
...
...
Enola
11 months ago
I think the answer is B) The number of Cached Partitions> the number of Spark Partitions.
upvoted 0 times
...
Christoper
11 months ago
D sounds like the right choice to me. If the on-heap and off-heap memory usage are out of balance, that's a sign of suboptimal caching.
upvoted 0 times
...
Marylin
12 months ago
I think the correct answer is C. The RDD Block Name with the '' annotation signals that the caching was unsuccessful.
upvoted 0 times
Scarlet
10 months ago
D) On Heap Memory Usage is within 75% of off Heap Memory usage
upvoted 0 times
...
Gwenn
10 months ago
C) The RDD Block Name included the '' annotation signaling failure to cache
upvoted 0 times
...
Mariko
10 months ago
B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Clare
11 months ago
A) Size on Disk is> 0
upvoted 0 times
...
Vivienne
11 months ago
User2
upvoted 0 times
...
Layla
11 months ago
User1
upvoted 0 times
...
...

Save Cancel