Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 3 Question 54 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 54
Topic #: 3
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame

itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

DataFrame itemsDf:

1. +------+----------------------------------+-----------------------------+-------------------+

2. |itemId|itemName |attributes |supplier |

3. +------+----------------------------------+-----------------------------+-------------------+

4. |1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5. |2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6. |3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7. +------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))

Show Suggested Answer Hide Answer
Suggested Answer: D

Correct code block:

schema = StructType([

StructField('itemId', IntegerType(), True),

StructField('attributes', ArrayType(StringType(), True), True),

StructField('supplier', StringType(), True)

])

spark.read.options(modifiedBefore='2029-03-20T05:44:46').schema(schema).parquet(filePath)

This Question: is more difficult than what you would encounter in the exam. In the exam, for this Question: type, only one error needs to be identified and not 'one or multiple' as in the

question.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

Correct! Columns in the schema definition should use the StructField type. Building a schema from pyspark.sql.types, as here using classes like StructType and StructField, is one of multiple ways

of expressing a schema in Spark. A StructType always contains a list of StructFields (see documentation linked below). So, nesting StructType and StructType as shown in the Question: is

wrong.

The modification date threshold should be specified by a keyword argument like options(modifiedBefore='2029-03-20T05:44:46') and not two consecutive non-keyword arguments as in the original

code block (see documentation linked below).

Spark cannot identify the file format correctly, because either it has to be specified by using the DataFrameReader.format(), as an argument to DataFrameReader.load(), or directly by calling, for

example, DataFrameReader.parquet().

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

No. If StructField would be used for the columns instead of StructType (see above), the third argument specified whether the column is nullable. The original schema shows that columns should be

nullable and this is specified correctly by the third argument being True in the schema in the code block.

It is correct, however, that the modification date threshold is specified incorrectly (see above).

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

Wrong. The attributes array is specified correctly, following the syntax for ArrayType (see linked documentation below). That Spark cannot identify the file format is correct, see correct answer

above. In addition, the DataFrameReader is called correctly through the SparkSession spark.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

Incorrect, the object types in the schema definition are correct and syntax of the call to Spark's DataFrameReader is correct.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

False. The data type of the schema is StructType and an accepted data type for the DataFrameReader.schema() method. It is correct however that the modification date threshold is specified

incorrectly (see correct answer above).


Contribute your Thoughts:

Eileen
1 months ago
Hmm, this question seems a bit consonant-ious. Maybe we should just replace all the code with a simple `print('the number of consonants is... a lot!')`.
upvoted 0 times
...
Youlanda
1 months ago
Whoa, this question is a real brain-teaser! I'm going to have to go with B. Size and regexp_replace seem like the way to go, and I prefer alias over as for renaming the column.
upvoted 0 times
...
Leota
1 months ago
This is a tricky one, but I think C is the correct answer. Using lower to convert everything to lowercase, regexp_replace to remove the vowels, and then length to count the remaining consonants seems like the way to go.
upvoted 0 times
Pete
7 days ago
Yes, C looks like the right choice for this task.
upvoted 0 times
...
Lauran
18 days ago
I believe it's C too, lower, regexp_replace, and length make sense.
upvoted 0 times
...
Lashaun
20 days ago
I think C is the correct answer.
upvoted 0 times
...
...
Adelina
2 months ago
Hmm, I'm leaning towards E. The size function might be a better choice than length, and regexp_extract could be used to extract the consonants instead of replacing the vowels.
upvoted 0 times
Kristofer
15 days ago
User3: E does seem like a strong option. Let's go with that for extracting the consonants.
upvoted 0 times
...
Linwood
25 days ago
User2: I agree, using size and regexp_extract would likely give us the desired result.
upvoted 0 times
...
Lindy
1 months ago
User1: I think E is the correct choice. Size and regexp_extract seem like a good fit for this task.
upvoted 0 times
...
...
Arlene
2 months ago
Hmm, that makes sense too. Let's review the question again before making our final choice.
upvoted 0 times
...
Aleisha
2 months ago
I disagree, I believe the correct answer is B because we need to use regexp_replace to remove vowels.
upvoted 0 times
...
Arlene
2 months ago
I think the answer is A because we need to find the length of the itemName column.
upvoted 0 times
...
Julio
2 months ago
Hmm, that makes sense too. Let's review the question again before making our final choice.
upvoted 0 times
...
Herminia
2 months ago
I disagree, I believe the correct answer is B because we need to use regexp_replace to remove vowels.
upvoted 0 times
...
Paz
2 months ago
I'm pretty sure the answer is D. The length function counts the number of characters, and regexp_replace can be used to remove vowels, leaving just the consonants. Then we can use alias to rename the column.
upvoted 0 times
Maynard
20 days ago
Agreed, D should give us the single-column DataFrame with the consonant count.
upvoted 0 times
...
Alyce
1 months ago
Let's go with D. It seems to be the right combination of functions for this task.
upvoted 0 times
...
Carey
1 months ago
Yes, D looks good. We need to count consonants in the itemName column.
upvoted 0 times
...
Lelia
2 months ago
I think D is correct. The length function counts characters and regexp_replace removes vowels.
upvoted 0 times
...
...
Julio
3 months ago
I think the answer is A because we need to find the length of the itemName column.
upvoted 0 times
...

Save Cancel