Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 1 Question 36 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 36
Topic #: 1
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following DataFrame methods is classified as a transformation?

Show Suggested Answer Hide Answer
Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.


Contribute your Thoughts:

Adelina
3 days ago
I'd go with A) DataFrame.count(). It's a common operation to get the number of rows in a DataFrame, and that seems like a transformation to me.
upvoted 0 times
...
Regenia
14 days ago
Hmm, I'm not sure about this one. Maybe D) DataFrame.foreach() is the transformation method, since it applies a function to each row of the DataFrame.
upvoted 0 times
...
Halina
15 days ago
I think C) DataFrame.select() is the correct transformation method. It allows you to select specific columns from a DataFrame, which is a common data manipulation task.
upvoted 0 times
...
Jonelle
18 days ago
I'm not sure about the others, but DataFrame.foreach() is definitely not a transformation because it is an action that applies a function to each element.
upvoted 0 times
...
Davida
20 days ago
I agree with Vashti. DataFrame.select() transforms the DataFrame by selecting specific columns.
upvoted 0 times
...
Vashti
24 days ago
I think DataFrame.select() is a transformation because it selects specific columns.
upvoted 0 times
...

Save Cancel