New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 1 Question 11 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.5 exam
Question #: 11
Topic #: 1
[All Databricks Certified Associate Developer for Apache Spark 3.5 Questions]

4 of 55.

A developer is working on a Spark application that processes a large dataset using SQL queries. Despite having a large cluster, the developer notices that the job is underutilizing the available resources. Executors remain idle for most of the time, and logs reveal that the number of tasks per stage is very low. The developer suspects that this is causing suboptimal cluster performance.

Which action should the developer take to improve cluster utilization?

Show Suggested Answer Hide Answer
Suggested Answer: A

In Spark SQL and DataFrame operations, the configuration parameter spark.sql.shuffle.partitions defines the number of partitions created during shuffle operations such as join, groupBy, and distinct.

The default value (in Spark 3.5) is 200.

If this number is too low, Spark creates fewer tasks, leading to idle executors and poor cluster utilization.

Increasing this value allows Spark to create more tasks that can run in parallel across executors, effectively using more cluster resources.

Correct approach:

spark.conf.set('spark.sql.shuffle.partitions', 400)

This increases the parallelism level of shuffle stages and improves overall resource utilization.

Why the other options are incorrect:

B: Reducing partitions further would decrease parallelism and worsen the underutilization issue.

C: Dynamic resource allocation scales executors up or down based on workload, but it doesn't fix low task parallelism caused by insufficient shuffle partitions.

D: Increasing dataset size is not a tuning solution and doesn't address task-level under-parallelization.

Reference (Databricks Apache Spark 3.5 -- Python / Study Guide):

Spark SQL Configuration: spark.sql.shuffle.partitions --- controls the number of shuffle partitions.

Databricks Exam Guide (June 2025): Section ''Troubleshooting and Tuning Apache Spark DataFrame API Applications'' --- tuning strategies, partitioning, and optimizing cluster utilization.

===========


Contribute your Thoughts:

0/2000 characters

Currently there are no comments in this discussion, be the first to comment!


Save Cancel