Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 6 Question 12 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.5 exam
Question #: 12
Topic #: 6
[All Databricks Certified Associate Developer for Apache Spark 3.5 Questions]

A data scientist wants each record in the DataFrame to contain:

The first attempt at the code does read the text files but each record contains a single line. This code is shown below:

The entire contents of a file

The full file path

The issue: reading line-by-line rather than full text per file.

Code:

corpus = spark.read.text("/datasets/raw_txt/*") \

.select('*', '_metadata.file_path')

Which change will ensure one record per file?

Options:

Show Suggested Answer Hide Answer
Suggested Answer: A

To read each file as a single record, use:

spark.read.text(path, wholetext=True)

This ensures that Spark reads the entire file contents into one row.


Contribute your Thoughts:

0/2000 characters
Roselle
1 hour ago
I think I remember that the `wholetext=True` option is what we need to read the entire file as one record.
upvoted 0 times
...

Save Cancel