New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Machine Learning Associate Exam - Topic 4 Question 38 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam
Question #: 38
Topic #: 4
[All Databricks Machine Learning Associate Questions]

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?

Show Suggested Answer Hide Answer
Suggested Answer: B

To apply the Pandas UDF predict to each record of a Spark DataFrame, you use the mapInPandas method. This method allows the Pandas UDF to operate on partitions of the DataFrame as pandas DataFrames, applying the specified function (predict in this case) to each partition. The correct code completion to execute this is simply mapInPandas(predict), which specifies the UDF to use without additional arguments or incorrect function calls. Reference:

PySpark DataFrame documentation (Using mapInPandas with UDFs).


Contribute your Thoughts:

0/2000 characters
Danica
3 days ago
Hmm, I'm not sure about this one. I'm leaning towards E) predict(spark_df.columns), but I could be wrong.
upvoted 0 times
...
Peter
8 days ago
The question is pretty straightforward. I think B is the way to go.
upvoted 0 times
...
Marisha
13 days ago
B) mapInPandas(predict) is the correct answer.
upvoted 0 times
...
Hildegarde
18 days ago
I vaguely recall something about using `Iterator` with UDFs, but I can't remember if `predict(Iterator(spark_df))` is the right syntax.
upvoted 0 times
...
Kara
24 days ago
I feel like `mapInPandas(predict)` is the most straightforward option, but I need to double-check if that's how we should apply the function.
upvoted 0 times
...
Iluminada
29 days ago
I think `predict(*spark_df.columns)` seems like it could work, but I’m not entirely confident about how the arguments are being passed.
upvoted 0 times
...
Gaston
1 month ago
I remember we practiced using `mapInPandas` with UDFs, but I'm not sure if it's the right choice here.
upvoted 0 times
...
Zena
1 month ago
This is a good test of my understanding of Pandas UDFs and Spark DataFrame operations. I'll need to think carefully about the syntax and how the predict function is expected to be used.
upvoted 0 times
...
Kyoko
1 month ago
I'm feeling pretty confident about this one. The question is asking us to complete the code block, so I'm guessing one of these options is the correct way to call the predict function.
upvoted 0 times
...
Carin
2 months ago
Okay, I think I've got a strategy. The key is to figure out how to properly pass the Spark DataFrame columns to the predict function. Let me try a few of these options and see which one works.
upvoted 0 times
...
Adelaide
2 months ago
Hmm, I'm a bit confused about the Pandas UDF function and how it's supposed to be used here. I'll need to review my notes on Spark DataFrame transformations.
upvoted 0 times
...
Leonor
2 months ago
This looks like a tricky one. I'll need to carefully read through the question and the code to make sure I understand what's being asked.
upvoted 0 times
...

Save Cancel