Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 78 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 78
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following describes a shuffle?

Show Suggested Answer Hide Answer
Suggested Answer: C

A shuffle is a Spark operation that results from DataFrame.coalesce().

No. DataFrame.coalesce() does not result in a shuffle.

A shuffle is a process that allocates partitions to executors.

This is incorrect.

A shuffle is a process that is executed during a broadcast hash join.

No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because

instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.

A shuffle is a process that compares data across executors.

No, in a shuffle, data is compared across partitions, and not executors.

More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)


Contribute your Thoughts:

Lamonica
2 days ago
I agree with Dana, a shuffle is definitely about allocating partitions to executors.
upvoted 0 times
...
Gilma
9 days ago
C) A shuffle is a process that compares data across partitions. This sounds like the correct answer to me.
upvoted 0 times
...
Dana
15 days ago
I believe a shuffle is a process that allocates partitions to executors.
upvoted 0 times
...
Tashia
22 days ago
I think a shuffle is when data is compared across partitions.
upvoted 0 times
...

Save Cancel