Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 1 Question 10 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.5 exam
Question #: 10
Topic #: 1
[All Databricks Certified Associate Developer for Apache Spark 3.5 Questions]

41 of 55. A data engineer is working on the DataFrame df1 and wants the Name with the highest count to appear first (descending order by count), followed by the next highest, and so on.

The DataFrame has columns:

id | Name | count | timestamp

---------------------------------

1 | USA | 10

2 | India | 20

3 | England | 50

4 | India | 50

5 | France | 20

6 | India | 10

7 | USA | 30

8 | USA | 40

Which code fragment should the engineer use to sort the data in the Name and count columns?

Show Suggested Answer Hide Answer
Suggested Answer: A

To sort a Spark DataFrame by multiple columns, use .orderBy() (or .sort()) with column expressions.

Correct syntax for descending and ascending mix:

from pyspark.sql.functions import col

df1.orderBy(col('count').desc(), col('Name').asc())

This sorts primarily by count in descending order and secondarily by Name in ascending order (alphabetically).

Why the other options are incorrect:

B/C: Default sort order is ascending; won't place highest counts first.

D: Reverses sorting logic --- sorts Name descending, not required.


PySpark DataFrame API --- orderBy() and col() for sorting with direction.

Databricks Exam Guide (June 2025): Section ''Using Spark DataFrame APIs'' --- sorting, ordering, and column expressions.

===========

Contribute your Thoughts:

0/2000 characters
Shaunna
1 day ago
Nice, this code will sort the DataFrame exactly as the question requires. Efficient and straightforward!
upvoted 0 times
...
Alpha
6 days ago
Looks good to me. This will give us the desired output with the highest count for each Name first.
upvoted 0 times
...
Stacey
11 days ago
That's the way to do it! Sorting by Name and count in descending order is the perfect solution.
upvoted 0 times
...
Erick
17 days ago
df1.sort_values(['Name', 'count'], ascending=[False, False])
upvoted 0 times
...
Martin
22 days ago
I believe we should sort by 'count' first and then by 'Name', but I’m not clear if we need to specify ascending or descending for both.
upvoted 0 times
...
Jackie
27 days ago
This seems similar to a question we did on grouping and sorting, but I’m a bit confused about how to handle ties in counts.
upvoted 0 times
...
Flo
2 months ago
I think we need to use the sort_values method, but I can’t recall the exact syntax for sorting by two columns.
upvoted 0 times
...
Dalene
2 months ago
I remember we practiced sorting DataFrames, but I’m not sure if we used multiple columns at once.
upvoted 0 times
...
Vanna
2 months ago
No problem, I'd use df1.sort_values(['count', 'Name'], ascending=[False, True]) to get the desired result.
upvoted 0 times
...
Lottie
2 months ago
I'm a little confused on the expected output format. Do we need to return the entire DataFrame or just the sorted 'Name' and 'count' columns?
upvoted 0 times
...
Xochitl
2 months ago
Ah I see, we need to sort by 'count' in descending order, and then by 'Name'. Should be a simple one-liner with sort().
upvoted 0 times
...
Alex
3 months ago
Hmm, I'm a bit unsure about the descending order part. I'll need to look up the right parameters to pass to sort() to get that.
upvoted 0 times
...
Benedict
3 months ago
This looks straightforward, I'd probably just use the sort() method and pass in the 'Name' and 'count' columns.
upvoted 0 times
...

Save Cancel