Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon MLA-C01 Exam - Topic 4 Question 18 Discussion

Actual exam question for Amazon's MLA-C01 exam
Question #: 18
Topic #: 4
[All MLA-C01 Questions]

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

Show Suggested Answer Hide Answer
Suggested Answer: B

This dataset shows severe class imbalance, with only 2% of records representing patients with the disease. AWS ML best practices recommend correcting imbalance only in the training dataset, while keeping validation and test sets representative of real-world distributions.

Synthetic Minority Oversampling Technique (SMOTE) generates synthetic samples of the minority class by interpolating between existing minority examples. This improves the model's ability to learn disease-related patterns without discarding data.

PCA is a dimensionality reduction method, not an oversampling technique. Oversampling the majority class worsens imbalance. Altering the test dataset would invalidate evaluation results.

Therefore, applying SMOTE to the training dataset is the correct approach.


Contribute your Thoughts:

0/2000 characters

Currently there are no comments in this discussion, be the first to comment!


Save Cancel