A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.
Databricks documentation on linear regression: Linear Regression in Spark ML
Tien
4 months agoOsvaldo
5 months agoKarima
5 months agoParis
5 months agoTaryn
5 months agoStarr
5 months agoDiego
6 months agoEdelmira
6 months agoChun
6 months agoAdolph
6 months agoBeckie
6 months agoMyong
6 months agoAmber
6 months agoCherry
11 months agoWhitney
10 months agoCandra
10 months agoVirgie
10 months agoDanica
12 months agoRonny
10 months agoCiara
10 months agoFelicidad
10 months agoValentin
10 months agoHerman
12 months agoLilli
12 months agoMaia
12 months agoAvery
12 months agoGregoria
11 months agoLeatha
11 months agoNydia
1 year agoDenae
11 months agoFelice
11 months agoMarge
11 months agoEmogene
12 months agoNiesha
1 year ago