New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 1 Question 10 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.5 exam
Question #: 10
Topic #: 1
[All Databricks Certified Associate Developer for Apache Spark 3.5 Questions]

41 of 55. A data engineer is working on the DataFrame df1 and wants the Name with the highest count to appear first (descending order by count), followed by the next highest, and so on.

The DataFrame has columns:

id | Name | count | timestamp

---------------------------------

1 | USA | 10

2 | India | 20

3 | England | 50

4 | India | 50

5 | France | 20

6 | India | 10

7 | USA | 30

8 | USA | 40

Which code fragment should the engineer use to sort the data in the Name and count columns?

Show Suggested Answer Hide Answer
Suggested Answer: A

To sort a Spark DataFrame by multiple columns, use .orderBy() (or .sort()) with column expressions.

Correct syntax for descending and ascending mix:

from pyspark.sql.functions import col

df1.orderBy(col('count').desc(), col('Name').asc())

This sorts primarily by count in descending order and secondarily by Name in ascending order (alphabetically).

Why the other options are incorrect:

B/C: Default sort order is ascending; won't place highest counts first.

D: Reverses sorting logic --- sorts Name descending, not required.


PySpark DataFrame API --- orderBy() and col() for sorting with direction.

Databricks Exam Guide (June 2025): Section ''Using Spark DataFrame APIs'' --- sorting, ordering, and column expressions.

===========

Contribute your Thoughts:

0/2000 characters
Flo
5 days ago
I think we need to use the sort_values method, but I can’t recall the exact syntax for sorting by two columns.
upvoted 0 times
...
Dalene
10 days ago
I remember we practiced sorting DataFrames, but I’m not sure if we used multiple columns at once.
upvoted 0 times
...
Vanna
15 days ago
No problem, I'd use df1.sort_values(['count', 'Name'], ascending=[False, True]) to get the desired result.
upvoted 0 times
...
Lottie
20 days ago
I'm a little confused on the expected output format. Do we need to return the entire DataFrame or just the sorted 'Name' and 'count' columns?
upvoted 0 times
...
Xochitl
26 days ago
Ah I see, we need to sort by 'count' in descending order, and then by 'Name'. Should be a simple one-liner with sort().
upvoted 0 times
...
Alex
1 month ago
Hmm, I'm a bit unsure about the descending order part. I'll need to look up the right parameters to pass to sort() to get that.
upvoted 0 times
...
Benedict
1 month ago
This looks straightforward, I'd probably just use the sort() method and pass in the 'Name' and 'count' columns.
upvoted 0 times
...

Save Cancel