Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 40 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 40
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following describes tasks?

Show Suggested Answer Hide Answer
Suggested Answer: E

Output of correct code block:

+----------------------------------+------+

|itemName |col |

+----------------------------------+------+

|Thick Coat for Walking in the Snow|blue |

|Thick Coat for Walking in the Snow|winter|

|Thick Coat for Walking in the Snow|cozy |

|Outdoors Backpack |green |

|Outdoors Backpack |summer|

|Outdoors Backpack |travel|

+----------------------------------+------+

The key to solving this Question: is knowing about Spark's explode operator. Using this operator, you can extract values from arrays into single rows. The following guidance steps through

the

answers systematically from the first to the last gap. Note that there are many ways to solving the gap questions and filtering out wrong answers, you do not always have to start filtering out from the

first gap, but can also exclude some answers based on obvious problems you see with them.

The answers to the first gap present you with two options: filter and where. These two are actually synonyms in PySpark, so using either of those is fine. The answer options to this gap therefore do

not help us in selecting the right answer.

The second gap is more interesting. One answer option includes 'Sports'.isin(col('Supplier')). This construct does not work, since Python's string does not have an isin method. Another option

contains col(supplier). Here, Python will try to interpret supplier as a variable. We have not set this variable, so this is not a viable answer. Then, you are left with answers options that include col

('supplier').contains('Sports') and col('supplier').isin('Sports'). The Question: states that we are looking for suppliers whose name includes Sports, so we have to go for the contains operator

here.

We would use the isin operator if we wanted to filter out for supplier names that match any entries in a list of supplier names.

Finally, we are left with two answers that fill the third gap both with 'itemName' and the fourth gap either with explode('attributes') or 'attributes'. While both are correct Spark syntax, only explode

('attributes') will help us achieve our goal. Specifically, the Question: asks for one attribute from column attributes per row - this is what the explode() operator does.

One answer option also includes array_explode() which is not a valid operator in PySpark.

More info: pyspark.sql.functions.explode --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 39 (Databricks import instructions)


Contribute your Thoughts:

Jonell
3 days ago
Option A sounds like the most technical description of what a task is, so I'm going with that one.
upvoted 0 times
...
Herminia
7 days ago
Hmm, I'm not sure about this one. B sounds like it could be right, but I'm not confident enough to choose that.
upvoted 0 times
...
Vi
13 days ago
I think option E is the correct answer. The driver assigns tasks to the executors, so that makes the most sense.
upvoted 0 times
...
Domonique
21 days ago
But tasks are commands, not assignments.
upvoted 0 times
...
Chi
23 days ago
I disagree, I believe the answer is E.
upvoted 0 times
...
Domonique
26 days ago
I think the answer is A.
upvoted 0 times
...

Save Cancel