41 of 55. A data engineer is working on the DataFrame df1 and wants the Name with the highest count to appear first (descending order by count), followed by the next highest, and so on.
The DataFrame has columns:
id | Name | count | timestamp
---------------------------------
1 | USA | 10
2 | India | 20
3 | England | 50
4 | India | 50
5 | France | 20
6 | India | 10
7 | USA | 30
8 | USA | 40
Which code fragment should the engineer use to sort the data in the Name and count columns?
To sort a Spark DataFrame by multiple columns, use .orderBy() (or .sort()) with column expressions.
Correct syntax for descending and ascending mix:
from pyspark.sql.functions import col
df1.orderBy(col('count').desc(), col('Name').asc())
This sorts primarily by count in descending order and secondarily by Name in ascending order (alphabetically).
Why the other options are incorrect:
B/C: Default sort order is ascending; won't place highest counts first.
D: Reverses sorting logic --- sorts Name descending, not required.
PySpark DataFrame API --- orderBy() and col() for sorting with direction.
Databricks Exam Guide (June 2025): Section ''Using Spark DataFrame APIs'' --- sorting, ordering, and column expressions.
===========
Flo
5 days agoDalene
10 days agoVanna
15 days agoLottie
20 days agoXochitl
26 days agoAlex
1 month agoBenedict
1 month ago