A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?
To display visual histograms and summaries of the numeric features in a Spark DataFrame, the Databricks utility function dbutils.data.summarize can be used. This function provides a comprehensive summary, including visual histograms.
Correct code:
dbutils.data.summarize(spark_df)
Other options like spark_df.describe() and spark_df.summary() provide textual statistical summaries but do not include visual histograms.
Databricks Utilities Documentation
Ty
6 months agoColene
6 months agoCasey
6 months agoMitzie
7 months agoGeorgene
7 months agoValentin
7 months agoLindsey
7 months agoShawnda
7 months agoEzekiel
8 months agoRoselle
8 months agoRaina
8 months agoLashanda
8 months agoFrancene
8 months agoVanna
8 months agoDaryl
1 year agoTula
1 year agoLera
11 months agoCruz
11 months agoVernell
12 months agoDominga
1 year agoHoney
12 months agoJulio
12 months agoCyndy
1 year agoEveline
1 year agoEzekiel
11 months agoLeslie
11 months agoEvangelina
11 months agoOctavio
1 year agoMichell
1 year agoWai
1 year agoTresa
1 year agoKris
1 year agoSena
1 year agoTheresia
1 year agoDortha
1 year agoLizbeth
1 year ago