Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 57 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 57
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

Show Suggested Answer Hide Answer
Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.


Contribute your Thoughts:

Danica
10 months ago
Option B? Really? That's like trying to count the number of unique snowflakes by first counting all the snowflakes and then dropping the duplicates. Definitely not the way to go here.
upvoted 0 times
...
Melissa
10 months ago
D is the way to go, my friends. It's the most comprehensive solution, and it's got that fancy `agg()` function. Gotta love that data aggregation magic!
upvoted 0 times
...
Janet
10 months ago
Hmm, I'm torn between A and E. They both seem to be doing the same thing, but E might be a bit more concise. What do you guys think?
upvoted 0 times
Antonio
10 months ago
I agree, A looks like the right choice.
upvoted 0 times
...
Melita
10 months ago
I think A is the correct one.
upvoted 0 times
...
...
Sharmaine
10 months ago
I'm going with C. The `distinct()` function seems like the most direct way to get the unique values in the column.
upvoted 0 times
Yan
10 months ago
C is the way to go. The distinct() function should give us the unique values.
upvoted 0 times
...
Stephaine
10 months ago
E) transactionsDf.distinct().select("storeId").count()
upvoted 0 times
...
Stephaine
10 months ago
No, that won't work. It doesn't use the distinct function.
upvoted 0 times
...
Stephaine
10 months ago
A) transactionsDf.select("storeId").dropDuplicates().count()
upvoted 0 times
...
Theodora
10 months ago
I'm going with E. Using distinct() directly on the DataFrame seems more efficient.
upvoted 0 times
...
Richelle
10 months ago
I think A is the correct answer.
upvoted 0 times
...
...
Ty
11 months ago
I disagree, I believe the correct answer is C.
upvoted 0 times
...
Eulah
11 months ago
I think the correct answer is A.
upvoted 0 times
...
Shawnda
11 months ago
Option A looks good to me. It's simple and straightforward, and I think it should do the trick.
upvoted 0 times
Linwood
10 months ago
Yeah, I agree. It looks like the most straightforward choice.
upvoted 0 times
...
Tresa
10 months ago
Yeah, I agree. It looks like the most straightforward choice.
upvoted 0 times
...
Veronika
10 months ago
I think option A is the correct one.
upvoted 0 times
...
Jodi
11 months ago
Yeah, I agree. It looks simple and should work.
upvoted 0 times
...
Anastacia
11 months ago
I think option A is the correct one.
upvoted 0 times
...
Sylvia
11 months ago
I think option A is the correct one.
upvoted 0 times
...
...

Save Cancel