New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Machine Learning Associate Exam - Topic 3 Question 4 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam
Question #: 4
Topic #: 3
[All Databricks Machine Learning Associate Questions]

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

A)

B)

C)

D)

Show Suggested Answer Hide Answer
Suggested Answer: C

To compute the root mean-squared-error (RMSE) of a linear regression model using Spark ML, the RegressionEvaluator class is used. The RegressionEvaluator is specifically designed for regression tasks and can calculate various metrics, including RMSE, based on the columns containing predictions and actual values.

The correct code block to compute RMSE from the preds_df DataFrame is:

regression_evaluator = RegressionEvaluator( predictionCol='prediction', labelCol='actual', metricName='rmse' ) rmse = regression_evaluator.evaluate(preds_df)

This code creates an instance of RegressionEvaluator, specifying the prediction and label columns, as well as the metric to be computed ('rmse'). It then evaluates the predictions in preds_df and assigns the resulting RMSE value to the rmse variable.

Options A and B incorrectly use BinaryClassificationEvaluator, which is not suitable for regression tasks. Option D also incorrectly uses BinaryClassificationEvaluator.


PySpark ML Documentation

Contribute your Thoughts:

0/2000 characters
Blythe
3 months ago
Wait, why is RMSE important again?
upvoted 0 times
...
Barabara
3 months ago
Definitely going with Option A, it makes sense!
upvoted 0 times
...
Tenesha
3 months ago
Not so sure about Option C, seems off.
upvoted 0 times
...
Glory
4 months ago
I think Option B looks right!
upvoted 0 times
...
Luis
4 months ago
RMSE is calculated using predictions and actual values.
upvoted 0 times
...
Jamal
4 months ago
I'm a bit confused about the syntax in these options. I hope one of them correctly uses the `preds_df` DataFrame for the RMSE calculation.
upvoted 0 times
...
Laura
4 months ago
I feel like Option C might be the right choice, but I can't recall if we need to specify the metric name explicitly.
upvoted 0 times
...
William
4 months ago
I remember practicing a similar question where we had to compute RMSE, and I think it involved using `evaluate` method on the evaluator.
upvoted 0 times
...
Pilar
5 months ago
I think we need to use the `RegressionEvaluator` to calculate RMSE, but I'm not sure which option shows that correctly.
upvoted 0 times
...
Karon
5 months ago
I'm a little confused by the different code options. I'll need to double-check the Spark ML documentation to make sure I'm selecting the right approach to calculate the RMSE for this linear regression model.
upvoted 0 times
...
Aileen
5 months ago
Okay, I've got this. Option B looks like the correct way to compute the RMSE. It's using the built-in Spark ML evaluator to calculate the metric directly from the prediction and actual columns in the DataFrame.
upvoted 0 times
...
Vicky
5 months ago
Hmm, I'm a bit unsure about this one. I need to make sure I understand how to properly calculate RMSE using the data in the Spark DataFrame. Let me think through the different code options carefully.
upvoted 0 times
...
Hyman
5 months ago
This looks like a straightforward question to calculate the RMSE of a linear regression model. I'll carefully review the code options and choose the one that correctly computes the RMSE.
upvoted 0 times
...
Loise
5 months ago
I remember learning about this in class. The counter variable is local, so any changes outside the For Each scope won't impact the next iteration. I'm feeling good about this one.
upvoted 0 times
...
Theodora
5 months ago
Isn't a process decision program chart meant for planning responses? I could confuse that with what they're looking for here.
upvoted 0 times
...
Lisha
5 months ago
Hmm, I'm a bit confused on this one. I know OSPF has different router types, but I can't remember which one generates the type 2 LSAs specifically.
upvoted 0 times
...
Rashida
2 years ago
Hold up, did anyone else notice the typo in option A? 'preds_df' should be 'predictions_df'. That's a dead giveaway that it's not the right answer. Gotta be on the lookout for those details, folks!
upvoted 0 times
...
Mike
2 years ago
Hmm, this is a tough one. I'd say Option D, just to mix things up a bit. Who needs RMSE anyway? We should be focusing on accuracy, not root-mean-squared-error.
upvoted 0 times
Gail
2 years ago
I agree, let's go with Option D.
upvoted 0 times
...
Markus
2 years ago
RMSE is important for evaluating the model, so Option D seems like the right choice.
upvoted 0 times
...
Luke
2 years ago
I agree, let's go with Option D.
upvoted 0 times
...
Linette
2 years ago
I think Option D is the correct choice.
upvoted 0 times
...
Svetlana
2 years ago
I agree, let's go with Option D.
upvoted 0 times
...
Ronnie
2 years ago
I think Option D is the correct choice.
upvoted 0 times
...
Lynda
2 years ago
I think Option D is the way to go.
upvoted 0 times
...
...
Becky
2 years ago
I'm torn between B and C, but I think C is the way to go. The math checks out, and it's a straightforward implementation.
upvoted 0 times
Temeka
2 years ago
User2
upvoted 0 times
...
Slyvia
2 years ago
User1
upvoted 0 times
...
...
Lilli
2 years ago
Option C looks promising, let's go with that. The formula seems spot on for computing the RMSE.
upvoted 0 times
Angella
2 years ago
I think Option C is the best option for this task.
upvoted 0 times
...
Karan
2 years ago
Great, let's proceed with Option C then.
upvoted 0 times
...
Leigha
2 years ago
Yes, Option C is the best one for calculating the RMSE.
upvoted 0 times
...
Valentin
2 years ago
The formula in Option C looks accurate for calculating RMSE.
upvoted 0 times
...
Laquanda
2 years ago
I agree, Option C is the correct choice for computing the RMSE.
upvoted 0 times
...
Ula
2 years ago
I agree, Option C seems like the right choice for this.
upvoted 0 times
...
Sabina
2 years ago
Let's use Option C to compute the RMSE.
upvoted 0 times
...
...

Save Cancel