Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 40 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 40
Topic #: 2

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following describes tasks?

AA task is a command sent from the driver to the executors in response to a transformation.

BTasks transform jobs into DAGs.

CA task is a collection of slots.

DA task is a collection of rows.

ETasks get assigned to the executors by the driver.

Show Suggested Answer

Suggested Answer: E

Output of correct code block:

+----------------------------------+------+

|itemName |col |

+----------------------------------+------+

|Thick Coat for Walking in the Snow|blue |

|Thick Coat for Walking in the Snow|winter|

|Thick Coat for Walking in the Snow|cozy |

|Outdoors Backpack |green |

|Outdoors Backpack |summer|

|Outdoors Backpack |travel|

+----------------------------------+------+

The key to solving this Question: is knowing about Spark's explode operator. Using this operator, you can extract values from arrays into single rows. The following guidance steps through

the

answers systematically from the first to the last gap. Note that there are many ways to solving the gap questions and filtering out wrong answers, you do not always have to start filtering out from the

first gap, but can also exclude some answers based on obvious problems you see with them.

The answers to the first gap present you with two options: filter and where. These two are actually synonyms in PySpark, so using either of those is fine. The answer options to this gap therefore do

not help us in selecting the right answer.

The second gap is more interesting. One answer option includes 'Sports'.isin(col('Supplier')). This construct does not work, since Python's string does not have an isin method. Another option

contains col(supplier). Here, Python will try to interpret supplier as a variable. We have not set this variable, so this is not a viable answer. Then, you are left with answers options that include col

('supplier').contains('Sports') and col('supplier').isin('Sports'). The Question: states that we are looking for suppliers whose name includes Sports, so we have to go for the contains operator

here.

We would use the isin operator if we wanted to filter out for supplier names that match any entries in a list of supplier names.

Finally, we are left with two answers that fill the third gap both with 'itemName' and the fourth gap either with explode('attributes') or 'attributes'. While both are correct Spark syntax, only explode

('attributes') will help us achieve our goal. Specifically, the Question: asks for one attribute from column attributes per row - this is what the explode() operator does.

One answer option also includes array_explode() which is not a valid operator in PySpark.

More info: pyspark.sql.functions.explode --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 39 (Databricks import instructions)

by Erick at Jul 28, 2023, 05:45 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

3 months ago

I think the answer is A.

upvoted 0 times

...