A data scientist receives an update on a business case about a machine that has thousands of error codes. The data scientist creates the following summary statistics profile while reviewing the logs for each machine:
Which of the following is the most likely concern with respect to data design for model ingestion?
With 19,000 possible error-code features and each machine reporting only a handful (median of 7), your feature matrix will be extremely sparse (most entries zero) which can negatively impact both storage and model performance unless you address it (e.g., via sparse data structures or dimensionality reduction).
Crista
1 days agoEladia
4 days agoSvetlana
5 days agoVi
8 days agoHaydee
12 days agoBernardine
13 days agoEdna
14 days ago