A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.
Databricks documentation on linear regression: Linear Regression in Spark ML
Paris
2 days agoTaryn
8 days agoStarr
13 days agoDiego
19 days agoEdelmira
24 days agoChun
1 month agoAdolph
1 month agoBeckie
1 month agoMyong
1 month agoAmber
1 month agoCherry
6 months agoWhitney
5 months agoCandra
5 months agoVirgie
5 months agoDanica
7 months agoRonny
5 months agoCiara
5 months agoFelicidad
5 months agoValentin
5 months agoHerman
7 months agoLilli
7 months agoMaia
7 months agoAvery
7 months agoGregoria
6 months agoLeatha
6 months agoNydia
7 months agoDenae
6 months agoFelice
6 months agoMarge
6 months agoEmogene
7 months agoNiesha
7 months ago