Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 1 Question 32 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 32
Topic #: 1
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following describes characteristics of the Spark driver?

Show Suggested Answer Hide Answer
Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.


Contribute your Thoughts:

Francoise
3 hours ago
Option E is the correct answer. The Spark driver is responsible for scheduling queries for execution on worker nodes.
upvoted 0 times
...
Maryann
16 days ago
I believe the Spark driver's responsibility also includes scheduling queries for execution on worker nodes.
upvoted 0 times
...
Rosalyn
18 days ago
I agree with James. That's definitely one of the characteristics of the Spark driver.
upvoted 0 times
...
James
22 days ago
I think the Spark driver requests the transformation of operations into DAG computations from the worker nodes.
upvoted 0 times
...

Save Cancel