Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 3 Question 69 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 69
Topic #: 3
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?

Show Suggested Answer Hide Answer
Suggested Answer: E

This Question: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

You can clearly see that column transactionDate should be dropped only after transactionTimestamp has been written. This is because to generate column transactionTimestamp, Spark needs to

read the values from column transactionDate.

Values in column transactionDate in the original transactionsDf DataFrame look like 2020-04-26 15:35. So, to convert those correctly, you would have to pass yyyy-MM-dd HH:mm. In other words:

The string indicating the date format should be adjusted.

While you might be tempted to change unix_timestamp() to to_unixtime() (in line with the from_unixtime() operator), this function does not exist in Spark. unix_timestamp() is the correct operator to

use here.

Also, there is no DataFrame.withColumnReplaced() operator. A similar operator that exists is DataFrame.withColumnRenamed().

Whether you use col() or not is irrelevant with unix_timestamp() - the command is fine with both.

Finally, you cannot assign a column like transactionsDf['columnName'] = ... in Spark. This is Pandas syntax (Pandas is a popular Python package for data analysis), but it is not supported in Spark.

So, you need to use Spark's DataFrame.withColumn() syntax instead.

More info: pyspark.sql.functions.unix_timestamp --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 28 (Databricks import instructions)


Contribute your Thoughts:

Roselle
2 months ago
I'm just happy there's no SQL in this question. I'd be drowning in a sea of `JOIN`s and `WHERE` clauses.
upvoted 0 times
Alease
12 days ago
User 3
upvoted 0 times
...
Zena
20 days ago
C) Let's go with option C for creating the DataFrame.
upvoted 0 times
...
Leslie
22 days ago
B) I think option B is the correct one.
upvoted 0 times
...
Hillary
27 days ago
User 2
upvoted 0 times
...
Theron
29 days ago
A) The first option looks good to me.
upvoted 0 times
...
Jackie
1 months ago
User 1
upvoted 0 times
...
...
Remona
2 months ago
Option C is the way to go. It's the only one that uses `to_timestamp` correctly with the right date format. The other options seem a bit convoluted.
upvoted 0 times
...
Terry
2 months ago
I'm not sure why people are overthinking this. Option C is clearly the right answer. The syntax is straightforward and easy to understand.
upvoted 0 times
...
Fredric
3 months ago
I'm not sure, but I think B could also be a possible answer.
upvoted 0 times
...
Cassi
3 months ago
I think Option A is the correct answer. The `to_timestamp` function is used with the correct date format 'dd/MM/yyyy HH:mm:ss'.
upvoted 0 times
Karl
1 months ago
I'm not sure, but Option D seems to have a mistake with 'to_datetime' instead of 'to_timestamp'.
upvoted 0 times
...
Vicki
1 months ago
I'm leaning towards Option C. It seems to correctly convert the date format.
upvoted 0 times
...
Anastacia
1 months ago
I think Option A is the right choice too.
upvoted 0 times
...
Chan
1 months ago
I think Option B might be the right choice. It renames the column before applying the timestamp conversion.
upvoted 0 times
...
Anastacia
1 months ago
I agree, Option A looks correct.
upvoted 0 times
...
Launa
2 months ago
I agree, Option A looks correct. The date format matches the input data format.
upvoted 0 times
...
...
Erick
3 months ago
Option C looks good to me. The `to_timestamp` function is used correctly to convert the string format 'dd/MM/yyyy HH:mm:ss' to a timestamp column.
upvoted 0 times
...
Paola
3 months ago
I disagree, I believe the correct answer is C.
upvoted 0 times
...
Lezlie
3 months ago
I think the correct answer is A.
upvoted 0 times
...

Save Cancel