Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 3 Question 70 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 70
Topic #: 3

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

AThe arguments to the withColumn method need to be reordered.

BThe arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.

CThe copy() operator should be appended to the code block to ensure a copy is returned.

DEach column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.

EThe method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Show Suggested Answer

Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.

by Lucina at Oct 19, 2024, 03:43 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

2 months ago

The error is obvious! The arguments to the withColumn method need to be reordered. Easy peasy.

upvoted 0 times

Angelo

1 months ago

C) The copy() operator should be appended to the code block to ensure a copy is returned.

upvoted 0 times

...

Verona

2 months ago

A) The arguments to the withColumn method need to be reordered.

upvoted 0 times

...

Veda

2 months ago

I believe option A) is correct, the arguments need to be reordered.

upvoted 0 times

...

Delsie

2 months ago

I agree with Hildegarde, the arguments should be reordered.

upvoted 0 times

...

Hildegarde

3 months ago

I think the error is that the arguments need to be reordered.

upvoted 0 times

...