Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 6 Question 15 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam

Question #: 15
Topic #: 6

[All Databricks-Certified-Professional-Data-Engineer Questions]

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

ASize on Disk is> 0

BThe number of Cached Partitions> the number of Spark Partitions

CThe RDD Block Name included the '' annotation signaling failure to cache

DOn Heap Memory Usage is within 75% of off Heap Memory usage

Show Suggested Answer

Suggested Answer: C

In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.

by Lazaro at Jul 06, 2024, 01:14 PM

Limited Time Offer

25%

Off

Get Premium Databricks-Certified-Professional-Data-Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Anissa

10 months ago

I'm no data engineer, but I hear caching is like trying to remember where you left your car keys. If the Spark UI is confused, you know you've got a problem!

upvoted 0 times

Phillip

10 months ago

B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Jettie

10 months ago

A) Size on Disk is> 0

upvoted 0 times

...

Raul

10 months ago

I feel like the answer is B. Anything that doesn't match the actual Spark partitions is probably not a good sign.

upvoted 0 times

Lizbeth

10 months ago

Yeah, it's important to keep an eye on those indicators in the Spark UI's Storage tab.

upvoted 0 times

...

Jerry

10 months ago

I agree, that would definitely be a sign that the cached table is not performing optimally.

upvoted 0 times

...

Frederica

10 months ago

I think the answer is B. If the number of Cached Partitions is greater than the number of Spark Partitions, it's not good.

upvoted 0 times

...

Lea

11 months ago

I think C) The RDD Block Name included the '' annotation signaling failure to cache is also a valid indicator of poor performance.

upvoted 0 times

...

Enola

11 months ago

But if the number of Cached Partitions is greater than the number of Spark Partitions, wouldn't that indicate a performance issue?

upvoted 0 times

...

Lemuel

11 months ago

I disagree, I believe the correct answer is A) Size on Disk is> 0.

upvoted 0 times

...

Georgeanna

11 months ago

This is a tough one, but I'm going to go with C. The '' annotation in the RDD Block Name is a clear indication of caching failure.

upvoted 0 times

Valentin

9 months ago

I agree with User1, C) The RDD Block Name included the '' annotation signaling failure to cache seems like the right choice

upvoted 0 times

...

Lachelle

9 months ago

I'm leaning towards B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Jenelle

10 months ago

I think it's A) Size on Disk is> 0

upvoted 0 times

...

Latanya

10 months ago

I'm not sure, but I think it might be B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Helaine

10 months ago

I disagree, I believe it's C) The RDD Block Name included the '' annotation signaling failure to cache

upvoted 0 times

...

Nell

11 months ago

I think it's A) Size on Disk is> 0

upvoted 0 times

...

Thaddeus

11 months ago

Hmm, I'm going with B. If the number of cached partitions is greater than the actual Spark partitions, that's a red flag.

upvoted 0 times

Yolando

10 months ago

Yeah, if the number of cached partitions is more than the Spark partitions, it's not performing optimally.

upvoted 0 times

...

Darci

10 months ago

I think B is the right indicator to look for.

upvoted 0 times

...

Enola

11 months ago

I think the answer is B) The number of Cached Partitions> the number of Spark Partitions.

upvoted 0 times

...

Christoper

11 months ago

D sounds like the right choice to me. If the on-heap and off-heap memory usage are out of balance, that's a sign of suboptimal caching.

upvoted 0 times

...

Marylin

12 months ago

I think the correct answer is C. The RDD Block Name with the '' annotation signals that the caching was unsuccessful.

upvoted 0 times

Scarlet

10 months ago

D) On Heap Memory Usage is within 75% of off Heap Memory usage

upvoted 0 times

...

Gwenn

10 months ago

C) The RDD Block Name included the '' annotation signaling failure to cache

upvoted 0 times

...

Mariko

10 months ago

B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Clare

11 months ago

A) Size on Disk is> 0

upvoted 0 times

...

Vivienne

11 months ago

User2

upvoted 0 times

...

Layla

11 months ago

User1

upvoted 0 times

...