A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.
Which of the following code blocks will accomplish this task?
To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.
Correct code:
from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)
Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.
In
3 months agoChristoper
3 months agoYaeko
3 months agoTamar
4 months agoGarry
4 months agoDottie
4 months agoCarmen
4 months agoAliza
4 months agoReena
5 months agoDick
5 months agoMing
5 months agoErick
5 months agoRashida
5 months agoThad
5 months agoTijuana
5 months agoSon
5 months agoDarci
5 months agoLouvenia
5 months agoTheron
5 months agoChantell
5 months agoMiesha
2 years agoAnnalee
2 years agoMitsue
2 years agoTegan
2 years agoAlisha
2 years agoTheodora
2 years agoSharen
2 years agoLashon
2 years agoKarl
2 years agoMayra
2 years agoLakeesha
2 years agoWhitley
2 years agoEleonora
2 years agoAlida
2 years agoDalene
2 years agoCherry
2 years agoLorrie
2 years agoAmmie
2 years agoSharen
2 years ago