Databricks Exam Databricks Machine Learning Associate Topic 3 Question 23 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam

Question #: 23
Topic #: 3

[All Databricks Machine Learning Associate Questions]

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.

Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

AChange the number of compute nodes to be half or less than half of the number of evaluations.

BChange the number of compute nodes and the number of evaluations to be much larger but equal.

CChange the iterative optimization algorithm used to facilitate the tuning process.

DChange the number of compute nodes to be double or more than double the number of evaluations.

Show Suggested Answer

Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.

Databricks documentation on linear regression: Linear Regression in Spark ML

by Lashandra at Jan 12, 2025, 12:56 PM

Limited Time Offer

25%

Off

Get Premium Databricks Machine Learning Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Paris

2 days ago

I agree, option C seems like the best choice!

upvoted 0 times

...

Taryn

8 days ago

Sounds like they need a better algorithm for tuning.

upvoted 0 times

...

Starr

13 days ago

I feel like option D might just complicate things further. More compute nodes don’t always mean better results, right?

upvoted 0 times

...

Diego

19 days ago

This question reminds me of a practice problem where we had to balance compute resources. I wonder if option B could work if we scale everything up together?

upvoted 0 times

...

Edelmira

24 days ago

I’m not entirely sure, but I think changing the optimization algorithm, like in option C, might help if the current one isn’t performing well.

upvoted 0 times

...

Chun

1 month ago

I remember reading that sometimes using too many compute nodes can lead to less effective tuning, so maybe option A could be a good choice?

upvoted 0 times

...

Adolph

1 month ago

I'm not totally sure about this one. The question mentions that the data scientist believes the lack of improvement is due to the parallelization, so I'm wondering if changing the optimization algorithm might be a better approach than just adjusting the number of nodes.

upvoted 0 times

...

Beckie

1 month ago

Okay, I think I've got it. The question is asking us to identify a change that could improve the model accuracy over the course of the tuning process. Based on the information provided, option D seems like the best choice - changing the number of compute nodes to be more than double the number of evaluations.

upvoted 0 times

...

Myong

1 month ago

Hmm, I think the key here is that the accuracy isn't improving over the eight evaluations, even though they're using eight compute nodes. Maybe changing the number of nodes could help with that?

upvoted 0 times

...

Amber

1 month ago

I'm a bit confused by this question. It seems like the issue is with the parallelization of the tuning process, but I'm not sure which change would be the best solution.

upvoted 0 times

...

Cherry

6 months ago

Option D, for sure! Gotta go big or go home, right? Double the nodes, double the fun!

upvoted 0 times

Whitney

5 months ago

Yeah, it's worth a try. Let's go big and see if it pays off in the accuracy of the model.

upvoted 0 times

...

Candra

5 months ago

I agree, doubling the number of compute nodes could make a difference in improving accuracy.

upvoted 0 times

...

Virgie

5 months ago

I think option D is the way to go. More compute nodes can definitely help.

upvoted 0 times

...

Danica

7 months ago

I'd go with option A. Fewer compute nodes than evaluations might introduce some serial processing, but it could help the data scientist identify a more consistent trend in the results.

upvoted 0 times

Ronny

5 months ago

It seems like a logical approach to improve the model accuracy during the tuning process.

upvoted 0 times

...

Ciara

5 months ago

Changing the number of compute nodes to be half or less than half of the evaluations could be beneficial.

upvoted 0 times

...

Felicidad

5 months ago

I agree, having fewer compute nodes might make it easier to track the trend in accuracy.

upvoted 0 times

...

Valentin

5 months ago

I think option A is a good choice. It could help with consistency in the results.

upvoted 0 times

...

Herman

7 months ago

Hmm, I'm not sure. The question mentions no trend of improvement, so maybe option B could work to explore a larger hyperparameter space. Worth a shot!

upvoted 0 times

...

Lilli

7 months ago

I think changing the iterative optimization algorithm used could also help improve the model accuracy.

upvoted 0 times

...

Maia

7 months ago

I disagree, I believe they should change the number of compute nodes to be double or more than double the number of evaluations.

upvoted 0 times

...

Avery

7 months ago

Option D seems like the way to go. Doubling the number of compute nodes should allow for more parallel evaluations and potentially improve the model accuracy.

upvoted 0 times

Gregoria

6 months ago

But wouldn't changing the iterative optimization algorithm also make a difference in improving the model accuracy?

upvoted 0 times

...

Leatha

6 months ago

I agree, increasing the number of compute nodes could help with parallel evaluations.

upvoted 0 times

...

Nydia

7 months ago

I think option C is the best choice. Changing the optimization algorithm could help the data scientist explore the hyperparameter space more effectively and potentially find a better model.

upvoted 0 times