A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.
Databricks documentation on linear regression: Linear Regression in Spark ML
Tien
3 months agoOsvaldo
3 months agoKarima
3 months agoParis
4 months agoTaryn
4 months agoStarr
4 months agoDiego
4 months agoEdelmira
4 months agoChun
5 months agoAdolph
5 months agoBeckie
5 months agoMyong
5 months agoAmber
5 months agoCherry
10 months agoWhitney
8 months agoCandra
9 months agoVirgie
9 months agoDanica
10 months agoRonny
8 months agoCiara
8 months agoFelicidad
8 months agoValentin
8 months agoHerman
10 months agoLilli
10 months agoMaia
10 months agoAvery
10 months agoGregoria
10 months agoLeatha
10 months agoNydia
11 months agoDenae
9 months agoFelice
9 months agoMarge
10 months agoEmogene
10 months agoNiesha
11 months ago