The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame
itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
DataFrame itemsDf:
1. +------+----------------------------------+-----------------------------+-------------------+
2. |itemId|itemName |attributes |supplier |
3. +------+----------------------------------+-----------------------------+-------------------+
4. |1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5. |2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6. |3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7. +------+----------------------------------+-----------------------------+-------------------+
Code block:
itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))
Correct code block:
schema = StructType([
StructField('itemId', IntegerType(), True),
StructField('attributes', ArrayType(StringType(), True), True),
StructField('supplier', StringType(), True)
])
spark.read.options(modifiedBefore='2029-03-20T05:44:46').schema(schema).parquet(filePath)
This Question: is more difficult than what you would encounter in the exam. In the exam, for this Question: type, only one error needs to be identified and not 'one or multiple' as in the
question.
Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.
Correct! Columns in the schema definition should use the StructField type. Building a schema from pyspark.sql.types, as here using classes like StructType and StructField, is one of multiple ways
of expressing a schema in Spark. A StructType always contains a list of StructFields (see documentation linked below). So, nesting StructType and StructType as shown in the Question: is
wrong.
The modification date threshold should be specified by a keyword argument like options(modifiedBefore='2029-03-20T05:44:46') and not two consecutive non-keyword arguments as in the original
code block (see documentation linked below).
Spark cannot identify the file format correctly, because either it has to be specified by using the DataFrameReader.format(), as an argument to DataFrameReader.load(), or directly by calling, for
example, DataFrameReader.parquet().
Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.
No. If StructField would be used for the columns instead of StructType (see above), the third argument specified whether the column is nullable. The original schema shows that columns should be
nullable and this is specified correctly by the third argument being True in the schema in the code block.
It is correct, however, that the modification date threshold is specified incorrectly (see above).
The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.
Wrong. The attributes array is specified correctly, following the syntax for ArrayType (see linked documentation below). That Spark cannot identify the file format is correct, see correct answer
above. In addition, the DataFrameReader is called correctly through the SparkSession spark.
Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.
Incorrect, the object types in the schema definition are correct and syntax of the call to Spark's DataFrameReader is correct.
The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.
False. The data type of the schema is StructType and an accepted data type for the DataFrameReader.schema() method. It is correct however that the modification date threshold is specified
incorrectly (see correct answer above).
Eileen
1 months agoYoulanda
1 months agoLeota
1 months agoPete
7 days agoLauran
18 days agoLashaun
20 days agoAdelina
2 months agoKristofer
15 days agoLinwood
25 days agoLindy
1 months agoArlene
2 months agoAleisha
2 months agoArlene
2 months agoJulio
2 months agoHerminia
2 months agoPaz
2 months agoMaynard
20 days agoAlyce
1 months agoCarey
1 months agoLelia
2 months agoJulio
3 months ago