Amazon MLA-C01 Exam - Topic 4 Question 18 Discussion

Actual exam question for Amazon's MLA-C01 exam

Question #: 18
Topic #: 4

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

AApply principal component analysis (PCA) to oversample the minority class in the training dataset.

BApply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

CRandomly oversample the majority class in the validation dataset.

DApply k-means clustering to undersample the minority class in the test dataset.

Show Suggested Answer

Suggested Answer: B

This dataset shows severe class imbalance, with only 2% of records representing patients with the disease. AWS ML best practices recommend correcting imbalance only in the training dataset, while keeping validation and test sets representative of real-world distributions.

Synthetic Minority Oversampling Technique (SMOTE) generates synthetic samples of the minority class by interpolating between existing minority examples. This improves the model's ability to learn disease-related patterns without discarding data.

PCA is a dimensionality reduction method, not an oversampling technique. Oversampling the majority class worsens imbalance. Altering the test dataset would invalidate evaluation results.

Therefore, applying SMOTE to the training dataset is the correct approach.

by Marti at May 03, 2026, 10:47 PM

Limited Time Offer

25%

Off

Get Premium MLA-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!