MultipleChoice
A data scientist wants each record in the DataFrame to contain:
The first attempt at the code does read the text files but each record contains a single line. This code is shown below:

The entire contents of a file
The full file path
The issue: reading line-by-line rather than full text per file.
Code:
corpus = spark.read.text("/datasets/raw_txt/*") \
.select('*', '_metadata.file_path')
Which change will ensure one record per file?
Options:
OptionsMultipleChoice
26 of 55. A data scientist at an e-commerce company is working with user data obtained from its subscriber database and has stored the data in a DataFrame df_user.
Before further processing, the data scientist wants to create another DataFrame df_user_non_pii and store only the non-PII columns. The PII columns in df_user are name, email, and birthdate.
Which code snippet can be used to meet this requirement?
A.
df_user_non_pii = df_user.drop("name", "email", "birthdate")
B.
df_user_non_pii = df_user.dropFields("name", "email", "birthdate")
C.
df_user_non_pii = df_user.select("name", "email", "birthdate")
D.
df_user_non_pii = df_user.remove("name", "email", "birthdate")
OptionsMultipleChoice
What is the benefit of using Pandas on Spark for data transformations?
Options:
Options