Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 34 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 34
Topic #: 2
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block displayed below contains an error. When the code block below has executed, it should have divided DataFrame transactionsDf into 14 parts, based on columns storeId and

transactionDate (in this order). Find the error.

Code block:

transactionsDf.coalesce(14, ("storeId", "transactionDate"))

Show Suggested Answer Hide Answer
Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.


Contribute your Thoughts:

Alline
1 months ago
Repartitioning is like trying to herd cats, am I right? But I'm glad the options are here to guide us. Option B, the one that makes the most sense, is the way to go.
upvoted 0 times
Shayne
3 days ago
Repartitioning can be tricky, but with the correct option, it becomes much easier. Option B is the way to go.
upvoted 0 times
...
Luis
4 days ago
Yes, option B seems to be the correct one. It's important to follow the right steps in coding.
upvoted 0 times
...
Portia
14 days ago
I agree, herding cats is a good comparison. Option B is definitely the most logical choice.
upvoted 0 times
...
...
Alberto
2 months ago
I'm pretty sure the code block is as lost as a goose in a snowstorm. But hey, at least we've got the options to choose from. Time to put on our thinking caps and find the right answer!
upvoted 0 times
Karan
1 months ago
Exactly! And we should append .select() to the code block as well.
upvoted 0 times
...
Marjory
1 months ago
You're right! And we also need to remove the parentheses around the column names.
upvoted 0 times
...
Teresita
1 months ago
I think the error is that the operator coalesce needs to be replaced by repartition.
upvoted 0 times
...
...
Rebbecca
2 months ago
Well, well, look at that! The code is as clear as mud. At least the correct answer is here to save the day. Option B, my friends, is the way to go.
upvoted 0 times
Chan
10 hours ago
Oh, I see. So, the correct option is to replace operator coalesce with repartition, remove the parentheses around the column names, and append .count() to the code block.
upvoted 0 times
...
Trinidad
3 days ago
No, that's not it. The correct answer is that operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .count() needs to be appended to the code block.
upvoted 0 times
...
Toshia
5 days ago
I think the error is that the parentheses around the column names need to be removed and .select() needs to be appended to the code block.
upvoted 0 times
...
Kris
8 days ago
User1: Great, let's make the necessary changes then.
upvoted 0 times
...
Melda
12 days ago
User3: Yes, option B is the way to go.
upvoted 0 times
...
Dawne
23 days ago
User2: I think the correct answer is option B.
upvoted 0 times
...
Mayra
1 months ago
User1: The code block needs some fixing.
upvoted 0 times
...
...
Ahmad
2 months ago
Ah, I see the problem! The 'coalesce' operator needs to be replaced with 'repartition', and the parentheses around the column names should be removed. Looks like option B is the correct answer.
upvoted 0 times
...
Ernestine
2 months ago
The code block has a few issues. The operator 'coalesce' is not the correct one to use for repartitioning. Also, the parentheses around the column names need to be removed.
upvoted 0 times
Valentine
1 months ago
C) Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .select() needs to be appended to the code block.
upvoted 0 times
...
Deja
1 months ago
A) The parentheses around the column names need to be removed and .select() needs to be appended to the code block.
upvoted 0 times
...
...
Demetra
2 months ago
Yes, and we should append .select() to the code block as well.
upvoted 0 times
...
Inocencia
2 months ago
I agree with Demetra. Also, the parentheses around the column names need to be removed.
upvoted 0 times
...
Demetra
3 months ago
I think the error is that the operator coalesce needs to be replaced by repartition.
upvoted 0 times
...

Save Cancel