Databricks Exam Databricks-Machine-Learning-Associate Topic 3 Question 4 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam

Question #: 4
Topic #: 3

[All Databricks-Machine-Learning-Associate Questions]

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

AOption A

BOption B

COption C

DOption D

Show Suggested Answer

Suggested Answer: C

To compute the root mean-squared-error (RMSE) of a linear regression model using Spark ML, the RegressionEvaluator class is used. The RegressionEvaluator is specifically designed for regression tasks and can calculate various metrics, including RMSE, based on the columns containing predictions and actual values.

The correct code block to compute RMSE from the preds_df DataFrame is:

regression_evaluator = RegressionEvaluator( predictionCol='prediction', labelCol='actual', metricName='rmse' ) rmse = regression_evaluator.evaluate(preds_df)

This code creates an instance of RegressionEvaluator, specifying the prediction and label columns, as well as the metric to be computed ('rmse'). It then evaluates the predictions in preds_df and assigns the resulting RMSE value to the rmse variable.

Options A and B incorrectly use BinaryClassificationEvaluator, which is not suitable for regression tasks. Option D also incorrectly uses BinaryClassificationEvaluator.

PySpark ML Documentation

by Brent at May 18, 2024, 01:20 PM

Limited Time Offer

25%

Off

Get Premium Databricks-Machine-Learning-Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Rashida

12 months ago

Hold up, did anyone else notice the typo in option A? 'preds_df' should be 'predictions_df'. That's a dead giveaway that it's not the right answer. Gotta be on the lookout for those details, folks!

upvoted 0 times

...

1 years ago

I'm torn between B and C, but I think C is the way to go. The math checks out, and it's a straightforward implementation.

upvoted 0 times

Temeka

11 months ago

User2

upvoted 0 times

...

Slyvia

11 months ago

User1

upvoted 0 times

...

1 years ago

Let's use Option C to compute the RMSE.

upvoted 0 times

...