Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 72 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 72
Topic #: 2
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

Show Suggested Answer Hide Answer
Suggested Answer: C

Correct code block:

transactionsDf.withColumn('cos', round(cos(degrees(transactionsDf.value)),2))

This Question: is especially confusing because col, 'cos' are so similar. Similar-looking answer options can also appear in the exam and, just like in this question, you need to pay attention to

the

details to identify what the correct answer option is.

The first answer option to throw out is the one that starts with withColumnRenamed: The Question: speaks specifically of adding a column. The withColumnRenamed operator only renames

an

existing column, however, so you cannot use it here.

Next, you will have to decide what should be in gap 2, the first argument of transactionsDf.withColumn(). Looking at the documentation (linked below), you can find out that the first argument of

withColumn actually needs to be a string with the name of the column to be added. So, any answer that includes col('cos') as the option for gap 2 can be disregarded.

This leaves you with two possible answers. The real difference between these two answers is where the cos and degree methods are, either in gaps 3 and 4, or vice-versa. From the QUESTION

NO: you

can find out that the new column should have 'the values in column value converted to degrees and having the cosine of those converted values taken'. This prescribes you a clear order of

operations: First, you convert values from column value to degrees and then you take the cosine of those values. So, the inner parenthesis (gap 4) should contain the degree method and then,

logically, gap 3 holds the cos method. This leaves you with just one possible correct answer.

More info: pyspark.sql.DataFrame.withColumn --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 49 (Databricks import instructions)


Contribute your Thoughts:

Tiffiny
2 months ago
I'm going to have to try this out in a notebook to see which one actually works best. This is the kind of question that really makes you think!
upvoted 0 times
Devora
20 days ago
C) DataFrame.coalesce(6)
upvoted 0 times
...
Lucille
21 days ago
B) DataFrame.coalesce(6).shuffle()
upvoted 0 times
...
Rikki
22 days ago
A) DataFrame.repartition(12)
upvoted 0 times
...
...
Desmond
2 months ago
E) DataFrame.repartition(6) seems like a simpler solution, but I guess it doesn't perform the full shuffle like the question is asking for.
upvoted 0 times
...
Thersa
2 months ago
D) DataFrame.coalesce(6, shuffle=True) also looks like a good option, but I'm not sure if it's the most efficient way to do it.
upvoted 0 times
Raina
18 days ago
E) DataFrame.repartition(6)
upvoted 0 times
...
Rolande
25 days ago
C) DataFrame.coalesce(6)
upvoted 0 times
...
Ivette
2 months ago
B) DataFrame.coalesce(6).shuffle()
upvoted 0 times
...
Van
2 months ago
A) DataFrame.repartition(12)
upvoted 0 times
...
...
Gladys
2 months ago
I'm not sure about this one. Is there a trick question hidden in there somewhere?
upvoted 0 times
...
Reita
3 months ago
B) DataFrame.coalesce(6).shuffle() is the correct answer. It reduces the number of partitions to 6 and performs a full shuffle to redistribute the data.
upvoted 0 times
Kasandra
2 months ago
You're welcome!
upvoted 0 times
...
Jani
2 months ago
Yes, that's right. It reduces the number of partitions to 6 and performs a full shuffle.
upvoted 0 times
...
Jani
2 months ago
I think B) DataFrame.coalesce(6).shuffle() is the correct answer.
upvoted 0 times
...
Long
2 months ago
Oh, I see. Thanks for clarifying!
upvoted 0 times
...
Kanisha
2 months ago
No, it's actually B) DataFrame.coalesce(6).shuffle()
upvoted 0 times
...
Ciara
2 months ago
I think the answer is A) DataFrame.repartition(12)
upvoted 0 times
...
...
Holley
3 months ago
I'm not sure, but I think option D) DataFrame.coalesce(6, shuffle=True) could also be correct as it explicitly mentions shuffling.
upvoted 0 times
...
Galen
3 months ago
I disagree, I believe the correct answer is C) DataFrame.coalesce(6). It reduces partitions without shuffling.
upvoted 0 times
...
Sherita
3 months ago
I think the answer is B) DataFrame.coalesce(6).shuffle(). It reduces partitions and performs a full shuffle.
upvoted 0 times
...

Save Cancel