Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 38 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 38
Topic #: 2

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier

whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1. +------+----------------------------------+-----------------------------+-------------------+

3. +------+----------------------------------+-----------------------------+-------------------+

7. +------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.__1__(__2__).select(__3__, __4__)

A1. filter
2. col('supplier').isin('Sports')
3. 'itemName'
4. explode(col('attributes'))

B1. where
2. col('supplier').contains('Sports')
3. 'itemName'
4. 'attributes'

C1. where
2. col(supplier).contains('Sports')
3. explode(attributes)
4. itemName

D1. where
2. 'Sports'.isin(col('Supplier'))
3. 'itemName'
4. array_explode('attributes')

E1. filter
2. col('supplier').contains('Sports')
3. 'itemName'
4. explode('attributes')

Show Suggested Answer

Suggested Answer: E

Output of correct code block:

+----------------------------------+------+

|itemName |col |

+----------------------------------+------+

|Thick Coat for Walking in the Snow|blue |

|Thick Coat for Walking in the Snow|winter|

|Thick Coat for Walking in the Snow|cozy |

|Outdoors Backpack |green |

|Outdoors Backpack |summer|

|Outdoors Backpack |travel|

+----------------------------------+------+

The key to solving this Question: is knowing about Spark's explode operator. Using this operator, you can extract values from arrays into single rows. The following guidance steps through

the

answers systematically from the first to the last gap. Note that there are many ways to solving the gap questions and filtering out wrong answers, you do not always have to start filtering out from the

first gap, but can also exclude some answers based on obvious problems you see with them.

The answers to the first gap present you with two options: filter and where. These two are actually synonyms in PySpark, so using either of those is fine. The answer options to this gap therefore do

not help us in selecting the right answer.

The second gap is more interesting. One answer option includes 'Sports'.isin(col('Supplier')). This construct does not work, since Python's string does not have an isin method. Another option

contains col(supplier). Here, Python will try to interpret supplier as a variable. We have not set this variable, so this is not a viable answer. Then, you are left with answers options that include col

('supplier').contains('Sports') and col('supplier').isin('Sports'). The Question: states that we are looking for suppliers whose name includes Sports, so we have to go for the contains operator

here.

We would use the isin operator if we wanted to filter out for supplier names that match any entries in a list of supplier names.

Finally, we are left with two answers that fill the third gap both with 'itemName' and the fourth gap either with explode('attributes') or 'attributes'. While both are correct Spark syntax, only explode

('attributes') will help us achieve our goal. Specifically, the Question: asks for one attribute from column attributes per row - this is what the explode() operator does.

One answer option also includes array_explode() which is not a valid operator in PySpark.

More info: pyspark.sql.functions.explode --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 39 (Databricks import instructions)

by Izetta at May 29, 2023, 09:18 PM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!