Actual Dumps for Databricks Databricks Certified Associate Developer for Apache Spark 3.5 Exam 2026

Question No: 1

MultipleChoice

How can a Spark developer ensure optimal resource utilization when running Spark jobs in Local Mode for testing?

Options:

Options

AConfigure the application to run in cluster mode instead of local mode.

BIncrease the number of local threads based on the number of CPU cores.

CUse the spark.dynamicAllocation.enabled property to scale resources dynamically.

DSet the spark.executor.memory property to a large value.

Question No: 2

MultipleChoice

A Spark application suffers from too many small tasks due to excessive partitioning. How can this be fixed without a full shuffle?

Options:

Options

AUse the distinct() transformation to combine similar partitions

BUse the coalesce() transformation with a lower number of partitions

CUse the sortBy() transformation to reorganize the data

DUse the repartition() transformation with a lower number of partitions

Question No: 3

MultipleChoice

15 of 55. A data engineer is working on a Streaming DataFrame (streaming_df) with the following streaming data:

id

name

count

timestamp

1

Delhi

20

2024-09-19T10:11

1

Delhi

50

2024-09-19T10:12

2

London

50

2024-09-19T10:15

3

Paris

30

2024-09-19T10:18

3

Paris

20

2024-09-19T10:20

4

Washington

10

2024-09-19T10:22

Which operation is supported with streaming_df?

Options

Astreaming_df.count()

Bstreaming_df.filter('count < 30')

Cstreaming_df.select(countDistinct('name'))

Dstreaming_df.show()

Question No: 4

MultipleChoice

You have:

DataFrame A: 128 GB of transactions

DataFrame B: 1 GB user lookup table

Which strategy is correct for broadcasting?

Options

ADataFrame B should be broadcasted because it is smaller and will eliminate the need for shuffling itself

BDataFrame B should be broadcasted because it is smaller and will eliminate the need for shuffling DataFrame A

CDataFrame A should be broadcasted because it is larger and will eliminate the need for shuffling DataFrame B

DDataFrame A should be broadcasted because it is smaller and will eliminate the need for shuffling itself

Question No: 5

MultipleChoice

A data scientist wants each record in the DataFrame to contain:

The first attempt at the code does read the text files but each record contains a single line. This code is shown below:

The entire contents of a file

The full file path

The issue: reading line-by-line rather than full text per file.

Code:

corpus = spark.read.text("/datasets/raw_txt/*") \

.select('*', '_metadata.file_path')

Which change will ensure one record per file?

Options:

Options

AAdd the option wholetext=True to the text() function

BAdd the option lineSep='\n' to the text() function

CAdd the option wholetext=False to the text() function

DAdd the option lineSep=', ' to the text() function

Question No: 6

MultipleChoice

26 of 55. A data scientist at an e-commerce company is working with user data obtained from its subscriber database and has stored the data in a DataFrame df_user.

Before further processing, the data scientist wants to create another DataFrame df_user_non_pii and store only the non-PII columns. The PII columns in df_user are name, email, and birthdate.

Which code snippet can be used to meet this requirement?

A.

df_user_non_pii = df_user.drop("name", "email", "birthdate")

B.

df_user_non_pii = df_user.dropFields("name", "email", "birthdate")

C.

df_user_non_pii = df_user.select("name", "email", "birthdate")

D.

df_user_non_pii = df_user.remove("name", "email", "birthdate")

Options

AOption A

BOption B

COption C

DOption D

Question No: 7

MultipleChoice

What is the benefit of using Pandas on Spark for data transformations?

Options:

Options

AIt is available only with Python, thereby reducing the learning curve.

BIt computes results immediately using eager execution, making it simple to use.

CIt runs on a single node only, utilizing the memory with memory-bound DataFrames and hence cost-efficient.

DIt executes queries faster using all the available cores in the cluster as well as provides Pandas's rich set of features.

Free Databricks Certified Associate Developer for Apache Spark 3.5 Exam Dumps August 2026