A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?
To display visual histograms and summaries of the numeric features in a Spark DataFrame, the Databricks utility function dbutils.data.summarize can be used. This function provides a comprehensive summary, including visual histograms.
Correct code:
dbutils.data.summarize(spark_df)
Other options like spark_df.describe() and spark_df.summary() provide textual statistical summaries but do not include visual histograms.
Databricks Utilities Documentation
Daryl
2 months agoTula
2 months agoLera
27 days agoCruz
1 months agoVernell
1 months agoDominga
2 months agoHoney
1 months agoJulio
1 months agoCyndy
2 months agoEveline
3 months agoEzekiel
1 months agoLeslie
1 months agoEvangelina
1 months agoOctavio
2 months agoMichell
3 months agoWai
3 months agoTresa
2 months agoKris
2 months agoSena
2 months agoTheresia
2 months agoDortha
4 months agoLizbeth
4 months ago