Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 46 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 46
Topic #: 2
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?

Show Suggested Answer Hide Answer
Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.


Contribute your Thoughts:

Argelia
2 months ago
Come on, this is a piece of cake! B is the only option that makes sense. The other choices are about as useful as a chocolate teapot. *chuckles*
upvoted 0 times
Marleen
20 days ago
User1: It's definitely B, the rest are like a chocolate teapot.
upvoted 0 times
...
Amber
20 days ago
User 3: Yeah, the other choices are not relevant.
upvoted 0 times
...
Myong
22 days ago
User 2: Agreed, B is the only option that makes sense.
upvoted 0 times
...
Elin
2 months ago
User3: Agreed, the other options are pretty useless.
upvoted 0 times
...
Beckie
2 months ago
User2: Yeah, B is the only one that makes sense.
upvoted 0 times
...
Gracia
2 months ago
User 1: I think B is the correct answer.
upvoted 0 times
...
Estrella
2 months ago
User1: I think B is the correct answer.
upvoted 0 times
...
...
Reuben
2 months ago
I was torn between B and D, but I think B is the better option. Wouldn't want to accidentally include any duplicates, you know? Wait, is that a spider on the ceiling? *screams*
upvoted 0 times
...
Malinda
2 months ago
Ha! I bet the exam creator was trying to trick us with those other options. But B is clearly the right way to get the unique store IDs. Easy peasy!
upvoted 0 times
Micaela
1 months ago
User2: Yeah, the other options were just distractions.
upvoted 0 times
...
Lisbeth
1 months ago
User1: I agree, B is the correct option.
upvoted 0 times
...
Cletus
2 months ago
User2: Yeah, the other options were just distractions.
upvoted 0 times
...
Catalina
2 months ago
User1: I agree, B is the correct option.
upvoted 0 times
...
...
Lorrine
3 months ago
Hmm, I'm not too sure about this one. I was thinking D might work, but I guess B is the better choice since it's more straightforward.
upvoted 0 times
...
Nada
3 months ago
I think B is the correct answer. Selecting the 'storeId' column and then applying the distinct() function is the way to go.
upvoted 0 times
Jenelle
2 months ago
User4: B) transactionsDf.select(\'storeId\').distinct() is the correct option.
upvoted 0 times
...
Lemuel
2 months ago
User3: Definitely, that's the correct approach to get all unique values of column storeId in DataFrame transactionsDf.
upvoted 0 times
...
Corinne
2 months ago
User2: Yes, I agree. Selecting the 'storeId' column and then applying the distinct() function is the way to go.
upvoted 0 times
...
Yoko
2 months ago
User1: I think B is the correct answer.
upvoted 0 times
...
...
Lauran
3 months ago
I'm not sure, but I think A) transactionsDf[\'storeId\'].distinct() might also work.
upvoted 0 times
...
Doyle
3 months ago
I agree with Leontine, because select(\'storeId\') will only return the unique values of the storeId column.
upvoted 0 times
...
Leontine
3 months ago
I think the correct answer is B) transactionsDf.select(\'storeId\').distinct().
upvoted 0 times
...

Save Cancel