Which of the following describes a shuffle?
A shuffle is a Spark operation that results from DataFrame.coalesce().
No. DataFrame.coalesce() does not result in a shuffle.
A shuffle is a process that allocates partitions to executors.
This is incorrect.
A shuffle is a process that is executed during a broadcast hash join.
No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because
instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.
A shuffle is a process that compares data across executors.
No, in a shuffle, data is compared across partitions, and not executors.
More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)
Lili
1 months agoCarmela
1 months agoLevi
22 days agoMicaela
1 months agoReynalda
1 months agoAnnelle
2 months agoElouise
22 days agoAnjelica
24 days agoAmber
30 days agoLucille
2 months agoLamonica
2 months agoGilma
2 months agoAmie
1 months agoLizbeth
2 months agoDana
2 months agoTashia
2 months ago