Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 6 Question 12 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.5 exam
Question #: 12
Topic #: 6
[All Databricks Certified Associate Developer for Apache Spark 3.5 Questions]

A data scientist wants each record in the DataFrame to contain:

The first attempt at the code does read the text files but each record contains a single line. This code is shown below:

The entire contents of a file

The full file path

The issue: reading line-by-line rather than full text per file.

Code:

corpus = spark.read.text("/datasets/raw_txt/*") \

.select('*', '_metadata.file_path')

Which change will ensure one record per file?

Options:

Show Suggested Answer Hide Answer
Suggested Answer: A

To read each file as a single record, use:

spark.read.text(path, wholetext=True)

This ensures that Spark reads the entire file contents into one row.


Contribute your Thoughts:

0/2000 characters
Lawrence
4 days ago
I think B is better for line separation.
upvoted 0 times
...
Dorethea
9 days ago
Option A is the way to go!
upvoted 0 times
...
Miesha
1 month ago
I’m confused about the options. I feel like `wholetext=True` makes sense, but I’m not 100% sure if it’s the only way to achieve that.
upvoted 0 times
...
Halina
1 month ago
This reminds me of a practice question where we had to adjust file reading options. I think `wholetext=False` would just keep it as is, right?
upvoted 0 times
...
Tequila
1 month ago
I'm not entirely sure, but I feel like the `lineSep` options are more about how to split lines rather than reading whole files.
upvoted 0 times
...
Roselle
2 months ago
I think I remember that the `wholetext=True` option is what we need to read the entire file as one record.
upvoted 0 times
...

Save Cancel