Databricks Machine Learning Associate Exam - Topic 1 Question 40 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam

Question #: 40
Topic #: 1

[All Databricks Machine Learning Associate Questions]

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.

Which of the following code blocks will accomplish this task?

Aspark_df[spark_df['price'] > 0]

Bspark_df.filter(col('price') > 0)

CSELECT * FROM spark_df WHERE price > 0

Dspark_df.loc[spark_df['price'] > 0,:]

Espark_df.loc[:,spark_df['price'] > 0]

Show Suggested Answer

Suggested Answer: B

To filter rows in a Spark DataFrame based on a condition, you use the filter method along with a column condition. The correct syntax in PySpark to accomplish this task is spark_df.filter(col('price') > 0), which filters the DataFrame to include only those rows where the value in the 'price' column is greater than 0. The col function is used to specify column-based operations. The other options provided either do not use correct Spark DataFrame syntax or are intended for different types of data manipulation frameworks like pandas. Reference:

PySpark DataFrame API documentation (Filtering DataFrames).

by Hildegarde at Apr 03, 2026, 06:33 AM

Limited Time Offer

25%

2 months ago

I remember practicing filtering DataFrames, but I'm not sure if it's `filter` or `where` in Spark.

upvoted 0 times

...