The data engineer is using Spark's MEMORY_ONLY storage level.
Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?
In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.
Anissa
10 months agoPhillip
10 months agoJettie
10 months agoRaul
10 months agoLizbeth
10 months agoJerry
10 months agoFrederica
10 months agoLea
11 months agoEnola
11 months agoLemuel
11 months agoGeorgeanna
11 months agoValentin
9 months agoLachelle
9 months agoJenelle
10 months agoLatanya
10 months agoHelaine
10 months agoNell
11 months agoThaddeus
11 months agoYolando
10 months agoDarci
10 months agoEnola
11 months agoChristoper
11 months agoMarylin
12 months agoScarlet
10 months agoGwenn
10 months agoMariko
10 months agoClare
11 months agoVivienne
11 months agoLayla
11 months ago