Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 3 Question 63 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 63
Topic #: 3

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

A1. spark
2. conf
3. set
4. 'spark.sql.shuffle.partitions'

B1. pyspark
2. config
3. set
4. spark.shuffle.partitions

C1. spark
2. conf
3. get
4. 'spark.sql.shuffle.partitions'

D1. pyspark
2. config
3. set
4. 'spark.sql.shuffle.partitions'

E1. spark
2. conf
3. set
4. 'spark.sql.aggregate.partitions'

Show Suggested Answer

Suggested Answer: C

Correct code block:

spark.conf.set('spark.sql.shuffle.partitions', 20)

The code block expresses the option incorrectly.

Correct! The option should be expressed as a string.

The code block sets the wrong option.

No, spark.sql.shuffle.partitions is the correct option for the use case in the question.

The code block sets the incorrect number of parts.

Wrong, the code block correctly states 20 parts.

The code block uses the wrong command for setting an option.

No, in PySpark spark.conf.set() is the correct command for setting an option.

The code block is missing a parameter.

Incorrect, spark.conf.set() takes two parameters.

More info: Configuration - Spark 3.1.2 Documentation

by Corinne at Aug 24, 2024, 02:51 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Sonia

1 months ago

I bet the exam writer was just shuffling the answers around like Spark shuffles the data. Talk about a party trick!

upvoted 0 times

...

Reita

1 months ago

Nah, it's gotta be A. The question specifically mentions the Spark SQL API, so we can't be using the PySpark API.

upvoted 0 times

...

Bobbye

1 months ago

Wait, is it C? The documentation says we should use `spark.conf.get()` to retrieve the current value of the configuration.

upvoted 0 times

...

3 months ago

Hmm, the correct answer is A. The code block is using the Spark SQL API, so we need to use `spark.conf.set()` to set the `spark.sql.shuffle.partitions` configuration.

upvoted 0 times

Catina

2 months ago

Hmm, the correct answer is A. The code block is using the Spark SQL API, so we need to use `spark.conf.set()` to set the `spark.sql.shuffle.partitions` configuration.

upvoted 0 times

...