Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 1 Question 18 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 18
Topic #: 1
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block displayed below contains an error. The code block should trigger Spark to cache DataFrame transactionsDf in executor memory where available, writing to disk where insufficient

executor memory is available, in a fault-tolerant way. Find the error.

Code block:

transactionsDf.persist(StorageLevel.MEMORY_AND_DISK)

Show Suggested Answer Hide Answer
Suggested Answer: C

The storage level is inappropriate for fault-tolerant storage.

Correct. Typically, when thinking about fault tolerance and storage levels, you would want to store redundant copies of the dataset. This can be achieved by using a storage level such as

StorageLevel.MEMORY_AND_DISK_2.

The code block uses the wrong command for caching.

Wrong. In this case, DataFrame.persist() needs to be used, since this operator supports passing a storage level. DataFrame.cache() does not support passing a storage level.

Caching is not supported in Spark, data are always recomputed.

Incorrect. Caching is an important component of Spark, since it can help to accelerate Spark programs to great extent. Caching is often a good idea for datasets that need to be accessed

repeatedly.

Data caching capabilities can be accessed through the spark object, but not through the DataFrame API.

No. Caching is either accessed through DataFrame.cache() or DataFrame.persist().

The DataFrameWriter needs to be invoked.

Wrong. The DataFrameWriter can be accessed via DataFrame.write and is used to write data to external data stores, mostly on disk. Here, we find keywords such as 'cache' and 'executor

memory' that point us away from using external data stores. We aim to save data to memory to accelerate the reading process, since reading from disk is comparatively slower. The

DataFrameWriter does not write to memory, so we cannot use it here.

More info: Best practices for caching in Spark SQL | by David Vrba | Towards Data Science


Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel