Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 71 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 71
Topic #: 2
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

Show Suggested Answer Hide Answer
Suggested Answer: D

transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

Correct. This code block uses a common pattern for finding the unique values across multiple columns: union and distinct. In fact, it is so common that it is even mentioned in the Spark

documentation for the union command (link below).

transactionsDf.select('value', 'productId').distinct()

Wrong. This code block returns unique rows, but not unique values.

transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Incorrect. This code block will output a one-row, two-column DataFrame where each cell has an array of unique values in the respective column (even omitting any nulls).

transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

No. This command will count the number of rows, but will not return unique values.

transactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

Wrong. This command will perform an outer join of the value and productId columns. As such, it will return a two-column DataFrame. If you picked this answer, it might be a good idea for you to read

up on the difference between union and join, a link is posted below.

More info: pyspark.sql.DataFrame.union --- PySpark 3.1.2 documentation, sql - What is the difference between JOIN and UNION? - Stack Overflow

Static notebook | Dynamic notebook: See test 3, Question: 21 (Databricks import instructions)


Contribute your Thoughts:

Valentin
6 months ago
Option C all the way! It's like the Goldilocks of solutions - not too hot, not too cold, just right. Plus, it's probably the only one that won't make the grader fall asleep while reading it.
upvoted 0 times
...
Franchesca
6 months ago
I'm gonna have to go with Option B. Who needs distinct when you can just count the unique values, amirite? Plus, it's always good to show off your mad agg skills.
upvoted 0 times
Erinn
6 months ago
I agree, showing off those agg skills is always a good idea.
upvoted 0 times
...
Dan
6 months ago
Option B is the way to go. Counting unique values is the way to go.
upvoted 0 times
...
...
Sabina
6 months ago
Why do you think D is the correct answer?
upvoted 0 times
...
Billy
7 months ago
I disagree, I believe the answer is D.
upvoted 0 times
...
Lavera
7 months ago
Option D? Really? Why would you ever want to do a union and then a distinct? Seems like a lot of unnecessary steps. I'm going with C, it's the clear winner here.
upvoted 0 times
Mabel
6 months ago
Let's go with C then, it's the clear winner.
upvoted 0 times
...
Olive
6 months ago
C is definitely the most efficient choice.
upvoted 0 times
...
Denae
6 months ago
I agree, D does seem like a lot of extra work.
upvoted 0 times
...
Margurite
6 months ago
I think C is the best option here.
upvoted 0 times
...
...
Sabina
7 months ago
I think the answer is C.
upvoted 0 times
...
Gail
7 months ago
Hmm, I'm not sure. Option E looks like it could work, but I don't want to get caught up in all those fancy collect_set functions. Let's keep it simple!
upvoted 0 times
Luther
6 months ago
User1: Great, let's go with option C then. Thanks for the input, guys!
upvoted 0 times
...
Gwenn
6 months ago
User3: I agree, let's keep it simple. Option C it is.
upvoted 0 times
...
Alyce
7 months ago
User2: Yeah, that sounds simple and straightforward. Let's go with option C.
upvoted 0 times
...
Golda
7 months ago
User1: I think option C is the way to go. Just select the columns and call distinct.
upvoted 0 times
...
...
Danilo
7 months ago
Option C is the way to go! It's simple and straightforward, no need to get fancy with all that other stuff.
upvoted 0 times
Glory
6 months ago
User4: I also think option C is the best solution.
upvoted 0 times
...
Nydia
6 months ago
User3: Option C is definitely the way to go.
upvoted 0 times
...
Shay
7 months ago
User2: Yeah, I think option C is the most straightforward.
upvoted 0 times
...
Edna
7 months ago
User1: I agree, option C is the simplest choice.
upvoted 0 times
...
...

Save Cancel