A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?
To display visual histograms and summaries of the numeric features in a Spark DataFrame, the Databricks utility function dbutils.data.summarize can be used. This function provides a comprehensive summary, including visual histograms.
Correct code:
dbutils.data.summarize(spark_df)
Other options like spark_df.describe() and spark_df.summary() provide textual statistical summaries but do not include visual histograms.
Databricks Utilities Documentation
Ty
4 months agoColene
5 months agoCasey
5 months agoMitzie
5 months agoGeorgene
5 months agoValentin
5 months agoLindsey
6 months agoShawnda
6 months agoEzekiel
6 months agoRoselle
6 months agoRaina
6 months agoLashanda
6 months agoFrancene
6 months agoVanna
6 months agoDaryl
11 months agoTula
11 months agoLera
10 months agoCruz
10 months agoVernell
10 months agoDominga
11 months agoHoney
10 months agoJulio
10 months agoCyndy
11 months agoEveline
12 months agoEzekiel
10 months agoLeslie
10 months agoEvangelina
10 months agoOctavio
11 months agoMichell
12 months agoWai
1 year agoTresa
11 months agoKris
11 months agoSena
11 months agoTheresia
11 months agoDortha
1 year agoLizbeth
1 year ago