Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Machine Learning Associate Exam - Topic 1 Question 40 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam
Question #: 40
Topic #: 1
[All Databricks Machine Learning Associate Questions]

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.

Which of the following code blocks will accomplish this task?

Show Suggested Answer Hide Answer
Suggested Answer: B

To filter rows in a Spark DataFrame based on a condition, you use the filter method along with a column condition. The correct syntax in PySpark to accomplish this task is spark_df.filter(col('price') > 0), which filters the DataFrame to include only those rows where the value in the 'price' column is greater than 0. The col function is used to specify column-based operations. The other options provided either do not use correct Spark DataFrame syntax or are intended for different types of data manipulation frameworks like pandas. Reference:

PySpark DataFrame API documentation (Filtering DataFrames).


Contribute your Thoughts:

0/2000 characters
Luther
1 hour ago
I remember practicing filtering DataFrames, but I'm not sure if it's `filter` or `where` in Spark.
upvoted 0 times
...

Save Cancel