A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?
To display visual histograms and summaries of the numeric features in a Spark DataFrame, the Databricks utility function dbutils.data.summarize can be used. This function provides a comprehensive summary, including visual histograms.
Correct code:
dbutils.data.summarize(spark_df)
Other options like spark_df.describe() and spark_df.summary() provide textual statistical summaries but do not include visual histograms.
Databricks Utilities Documentation
Mitzie
2 days agoGeorgene
8 days agoValentin
13 days agoLindsey
19 days agoShawnda
24 days agoEzekiel
1 month agoRoselle
1 month agoRaina
1 month agoLashanda
1 month agoFrancene
1 month agoVanna
1 month agoDaryl
6 months agoTula
6 months agoLera
5 months agoCruz
5 months agoVernell
5 months agoDominga
6 months agoHoney
5 months agoJulio
5 months agoCyndy
6 months agoEveline
7 months agoEzekiel
5 months agoLeslie
5 months agoEvangelina
5 months agoOctavio
6 months agoMichell
7 months agoWai
7 months agoTresa
6 months agoKris
6 months agoSena
6 months agoTheresia
6 months agoDortha
7 months agoLizbeth
7 months ago