A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?
To display visual histograms and summaries of the numeric features in a Spark DataFrame, the Databricks utility function dbutils.data.summarize can be used. This function provides a comprehensive summary, including visual histograms.
Correct code:
dbutils.data.summarize(spark_df)
Other options like spark_df.describe() and spark_df.summary() provide textual statistical summaries but do not include visual histograms.
Databricks Utilities Documentation
Ty
3 months agoColene
3 months agoCasey
3 months agoMitzie
4 months agoGeorgene
4 months agoValentin
4 months agoLindsey
4 months agoShawnda
4 months agoEzekiel
5 months agoRoselle
5 months agoRaina
5 months agoLashanda
5 months agoFrancene
5 months agoVanna
5 months agoDaryl
9 months agoTula
10 months agoLera
8 months agoCruz
8 months agoVernell
9 months agoDominga
10 months agoHoney
9 months agoJulio
9 months agoCyndy
9 months agoEveline
10 months agoEzekiel
8 months agoLeslie
8 months agoEvangelina
8 months agoOctavio
10 months agoMichell
10 months agoWai
11 months agoTresa
9 months agoKris
9 months agoSena
9 months agoTheresia
10 months agoDortha
11 months agoLizbeth
11 months ago