Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?
To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.
Correct code:
from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)
Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.
Tayna
2 days agoSkye
5 days agoKyoko
9 days agoCharolette
10 days agoKyoko
11 days ago