A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data.
The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the dat
a. The data scientists also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data.
Which solution will meet these requirements with the LEAST amount of compute resources?
To perform efficient exploratory data analysis (EDA) on a large dataset for anomaly detection, using the First K option in SageMaker Data Wrangler is an optimal choice. This option allows the data scientist to select the first K rows, limiting the data loaded into memory, which conserves compute resources.
Given that the First K option allows the data scientist to determine K based on domain knowledge, this approach provides a representative sample without requiring extensive compute resources. Other options like randomized sampling may not provide data samples that are as useful for initial analysis in a time-series or sequential dataset context.
Josephine
3 months agoLavonne
3 months agoMi
3 months agoDaniel
4 months agoRaina
4 months agoRory
4 months agoJosue
4 months agoAllene
4 months agoGussie
5 months agoAnnice
5 months agoLoreta
5 months agoShaniqua
5 months agoMeghann
5 months agoGerald
12 months agoGennie
11 months agoMalcom
11 months agoJuliana
11 months agoCatrice
11 months agoFernanda
12 months agoCarline
12 months agoHelaine
11 months agoAleshia
11 months agoKris
12 months agoRozella
1 year agoRex
1 year agoSalina
12 months agoCorrina
12 months agoKallie
1 year agoLeanna
11 months agoAlesia
11 months agoFabiola
12 months agoSabrina
12 months agoKris
1 year ago