Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 1 Question 32 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam

Question #: 32
Topic #: 1

[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following describes characteristics of the Spark driver?

AThe Spark driver requests the transformation of operations into DAG computations from the worker nodes.

BIf set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

CThe Spark driver processes partitions in an optimized, distributed fashion.

DIn a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

EThe Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Show Suggested Answer

Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.

by Leonora at Jan 08, 2023, 04:50 PM

Limited Time Offer

25%

Off

Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions as Interactive Web-Based Practice Test or PDF