Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 78 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam

Question #: 78
Topic #: 2

[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following describes a shuffle?

AA shuffle is a process that is executed during a broadcast hash join.

BA shuffle is a process that compares data across executors.

CA shuffle is a process that compares data across partitions.

DA shuffle is a Spark operation that results from DataFrame.coalesce().

EA shuffle is a process that allocates partitions to executors.

Show Suggested Answer

Suggested Answer: C

A shuffle is a Spark operation that results from DataFrame.coalesce().

No. DataFrame.coalesce() does not result in a shuffle.

A shuffle is a process that allocates partitions to executors.

This is incorrect.

A shuffle is a process that is executed during a broadcast hash join.

No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because

instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.

A shuffle is a process that compares data across executors.

No, in a shuffle, data is compared across partitions, and not executors.

More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)

by Josefa at Apr 03, 2025, 04:33 AM

Limited Time Offer

25%

Off

Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions as Interactive Web-Based Practice Test or PDF