Databricks Machine Learning Associate Exam - Topic 1 Question 37 Discussion

Question

Databricks Machine Learning Associate Exam - Topic 1 Question 37 Discussion

A data scientist has developed a machine learning pipeline with a static input data set using Spark ML, but the pipeline is taking too long to process. They increase the number of workers in the cluster to get the pipeline to run more efficiently. They notice that the number of rows in the training set after reconfiguring the cluster is different from the number of rows in the training set prior to reconfiguring the cluster.Which of the following approaches will guarantee a reproducible training and test set for each model?

A) Manually configure the cluster

C) Set a speed in the data splitting operation

D) Manually partition the input data

Accepted Answer

B) Write out the split data sets to persistent storage

Databricks Machine Learning Associate Exam - Topic 1 Question 37 Discussion

Databricks Machine Learning Associate Exam - Topic 1 Question 37 Discussion

Contribute your Thoughts:

Louvenia

Letha

Val

Leonida

Mitzie

Leota

Shonda

Robt

Herman

Catherin

Frederic

Cheryl

Catarina

Kenny

Charlie

Shelton

Kiera

Kattie

Margurite

Mozell

Maddie

Audria

Lynelle

Michel

Raylene

Hortencia

Nicolette

Gregoria

Nicolette

Stephaine