Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 4 Question 18 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 18
Topic #: 4
[All Databricks-Machine-Learning-Associate Questions]

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Show Suggested Answer Hide Answer
Suggested Answer: C

To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.

Correct code:

from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)

Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.


PySpark SQL Documentation

Contribute your Thoughts:

Tayna
2 days ago
Hmm, I was leaning towards option A, but I can see how option C makes more sense. Gotta love those extra metadata layers!
upvoted 0 times
...
Skye
5 days ago
I think option C is the correct answer. The pandas API on Spark DataFrames is built on top of Spark DataFrames and adds additional metadata to them.
upvoted 0 times
...
Kyoko
9 days ago
Hmm, that makes sense too. I can see how both answers could be valid.
upvoted 0 times
...
Charolette
10 days ago
I disagree, I believe the answer is C) pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata.
upvoted 0 times
...
Kyoko
11 days ago
I think the answer is A) pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata.
upvoted 0 times
...

Save Cancel