Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 68 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 68
Topic #: 2
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block displayed below contains an error. The code block should display the schema of DataFrame transactionsDf. Find the error.

Code block:

transactionsDf.rdd.printSchema

Show Suggested Answer Hide Answer
Suggested Answer: E

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Correct! Find out more about partitionBy() in the documentation (linked below).

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

No. There is no information about whether files should be overwritten in the question.

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

Incorrect. To write a DataFrame to disk, you need to work with a DataFrameWriter object which you get access to through the DataFrame.writer property - no parentheses involved.

Column storeId should be wrapped in a col() operator.

No, this is not necessary - the problem is in the partitionOn command (see above).

The partitionOn method should be called before the write method.

Wrong. First of all partitionOn is not a valid method of DataFrame. However, even assuming partitionOn would be replaced by partitionBy (which is a valid method), this method is a method of

DataFrameWriter and not of DataFrame. So, you would always have to first call DataFrame.write to get access to the DataFrameWriter object and afterwards call partitionBy.

More info: pyspark.sql.DataFrameWriter.partitionBy --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 33 (Databricks import instructions)


Contribute your Thoughts:

Veronika
2 months ago
Hey, at least they didn't ask you to 'printSchemaBackwards()' - that would have been a real head-scratcher!
upvoted 0 times
Myrtie
19 days ago
User1: Exactly, printSchema is a method and should be called directly from the DataFrame.
upvoted 0 times
...
Felicidad
1 months ago
User2: Yeah, it should be written as printSchema() instead of printSchema.
upvoted 0 times
...
Chery
1 months ago
User1: The error is that printSchema should be called directly from transactionsDf.
upvoted 0 times
...
...
Cordell
2 months ago
C is close, but the spark session shouldn't be required here. The DataFrame object should have direct access to the printSchema() method.
upvoted 0 times
Lisbeth
26 days ago
C
upvoted 0 times
...
Marla
1 months ago
B
upvoted 0 times
...
Janella
1 months ago
A
upvoted 0 times
...
...
Suzan
2 months ago
B is a decent attempt, but the print() operation shouldn't be necessary. The printSchema() method should work on its own.
upvoted 0 times
Mendy
1 months ago
C
upvoted 0 times
...
Scarlet
2 months ago
A
upvoted 0 times
...
...
Sharmaine
2 months ago
Yes, that makes sense. We should always call printSchema directly from the DataFrame.
upvoted 0 times
...
Ronald
2 months ago
A is a terrible answer. Of course there's a way to print the schema directly in Spark! They should have just looked at the options more closely.
upvoted 0 times
...
Lemuel
2 months ago
D is the correct answer. printSchema() is a method of the DataFrame object, not the RDD object, so it should be called directly on transactionsDf instead of transactionsDf.rdd.
upvoted 0 times
...
Christiane
2 months ago
I agree with you, the correct answer is D) printSchema is a method and should be written as printSchema().
upvoted 0 times
...
Sharmaine
2 months ago
I think the error is that printSchema should be called directly from transactionsDf.
upvoted 0 times
...

Save Cancel