A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.
Databricks documentation on linear regression: Linear Regression in Spark ML
Tien
6 months agoOsvaldo
6 months agoKarima
6 months agoParis
7 months agoTaryn
7 months agoStarr
7 months agoDiego
7 months agoEdelmira
7 months agoChun
8 months agoAdolph
8 months agoBeckie
8 months agoMyong
8 months agoAmber
8 months agoCherry
1 year agoWhitney
11 months agoCandra
12 months agoVirgie
12 months agoDanica
1 year agoRonny
11 months agoCiara
11 months agoFelicidad
11 months agoValentin
11 months agoHerman
1 year agoLilli
1 year agoMaia
1 year agoAvery
1 year agoGregoria
1 year agoLeatha
1 year agoNydia
1 year agoDenae
1 year agoFelice
1 year agoMarge
1 year agoEmogene
1 year agoNiesha
1 year ago