Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 3 Question 4 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 4
Topic #: 3
[All Databricks-Machine-Learning-Associate Questions]

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

A)

B)

C)

D)

Show Suggested Answer Hide Answer
Suggested Answer: C

To compute the root mean-squared-error (RMSE) of a linear regression model using Spark ML, the RegressionEvaluator class is used. The RegressionEvaluator is specifically designed for regression tasks and can calculate various metrics, including RMSE, based on the columns containing predictions and actual values.

The correct code block to compute RMSE from the preds_df DataFrame is:

regression_evaluator = RegressionEvaluator( predictionCol='prediction', labelCol='actual', metricName='rmse' ) rmse = regression_evaluator.evaluate(preds_df)

This code creates an instance of RegressionEvaluator, specifying the prediction and label columns, as well as the metric to be computed ('rmse'). It then evaluates the predictions in preds_df and assigns the resulting RMSE value to the rmse variable.

Options A and B incorrectly use BinaryClassificationEvaluator, which is not suitable for regression tasks. Option D also incorrectly uses BinaryClassificationEvaluator.


PySpark ML Documentation

Contribute your Thoughts:

Rashida
12 months ago
Hold up, did anyone else notice the typo in option A? 'preds_df' should be 'predictions_df'. That's a dead giveaway that it's not the right answer. Gotta be on the lookout for those details, folks!
upvoted 0 times
...
Mike
12 months ago
Hmm, this is a tough one. I'd say Option D, just to mix things up a bit. Who needs RMSE anyway? We should be focusing on accuracy, not root-mean-squared-error.
upvoted 0 times
Gail
11 months ago
I agree, let's go with Option D.
upvoted 0 times
...
Markus
11 months ago
RMSE is important for evaluating the model, so Option D seems like the right choice.
upvoted 0 times
...
Luke
11 months ago
I agree, let's go with Option D.
upvoted 0 times
...
Linette
11 months ago
I think Option D is the correct choice.
upvoted 0 times
...
Svetlana
11 months ago
I agree, let's go with Option D.
upvoted 0 times
...
Ronnie
11 months ago
I think Option D is the correct choice.
upvoted 0 times
...
Lynda
12 months ago
I think Option D is the way to go.
upvoted 0 times
...
...
Becky
1 years ago
I'm torn between B and C, but I think C is the way to go. The math checks out, and it's a straightforward implementation.
upvoted 0 times
Temeka
11 months ago
User2
upvoted 0 times
...
Slyvia
11 months ago
User1
upvoted 0 times
...
...
Lilli
1 years ago
Option C looks promising, let's go with that. The formula seems spot on for computing the RMSE.
upvoted 0 times
Angella
11 months ago
I think Option C is the best option for this task.
upvoted 0 times
...
Karan
11 months ago
Great, let's proceed with Option C then.
upvoted 0 times
...
Leigha
12 months ago
Yes, Option C is the best one for calculating the RMSE.
upvoted 0 times
...
Valentin
12 months ago
The formula in Option C looks accurate for calculating RMSE.
upvoted 0 times
...
Laquanda
12 months ago
I agree, Option C is the correct choice for computing the RMSE.
upvoted 0 times
...
Ula
12 months ago
I agree, Option C seems like the right choice for this.
upvoted 0 times
...
Sabina
1 years ago
Let's use Option C to compute the RMSE.
upvoted 0 times
...
...

Save Cancel