Amazon Exam MLS-C01 Topic 2 Question 95 Discussion

Actual exam question for Amazon's MLS-C01 exam

Question #: 95
Topic #: 2

A machine learning specialist works for a fruit processing company and needs to build a system that

categorizes apples into three types. The specialist has collected a dataset that contains 150 images for each type of apple and applied transfer learning on a neural network that was pretrained on ImageNet with this dataset.

The company requires at least 85% accuracy to make use of the model.

After an exhaustive grid search, the optimal hyperparameters produced the following:

68% accuracy on the training set

67% accuracy on the validation set

What can the machine learning specialist do to improve the system's accuracy?

AUpload the model to an Amazon SageMaker notebook instance and use the Amazon SageMaker HPO feature to optimize the model's hyperparameters.

BAdd more data to the training set and retrain the model using transfer learning to reduce the bias.

CUse a neural network model with more layers that are pretrained on ImageNet and apply transfer learning to increase the variance.

DTrain a new model using the current neural network architecture.

Show Suggested Answer

Suggested Answer: A

SageMaker Data Wrangler is a feature of SageMaker Studio that provides an end-to-end solution for importing, preparing, transforming, featurizing, and analyzing data. Data Wrangler includes built-in analyses that help generate visualizations and data insights in a few clicks. One of the built-in analyses is the Quick Model visualization, which can be used to quickly evaluate the data and produce importance scores for each feature. A feature importance score indicates how useful a feature is at predicting a target label. The feature importance score is between [0, 1] and a higher number indicates that the feature is more important to the whole dataset. The Quick Model visualization uses a random forest model to calculate the feature importance for each feature using the Gini importance method. This method measures the total reduction in node impurity (a measure of how well a node separates the classes) that is attributed to splitting on a particular feature. The ML developer can use the Quick Model visualization to obtain the importance scores for each feature of the dataset and use them to feature engineer the dataset. This solution requires the least development effort compared to the other options.

References:

* Analyze and Visualize

* Detect multicollinearity, target leakage, and feature correlation with Amazon SageMaker Data Wrangler

by Lavina at Jun 22, 2024, 11:17 PM

Limited Time Offer

25%

Off

Get Premium MLS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

2 months ago

I disagree, I believe option C is the better choice. Using a more complex neural network model with more layers that is pretrained on ImageNet will increase the variance and potentially lead to higher accuracy.

upvoted 0 times

Ling

16 days ago

I think trying out both options could be the best approach.

upvoted 0 times

...

Natalie

19 days ago

But adding more data to the training set might also be beneficial.

upvoted 0 times

...

Carman

23 days ago

Option C is a good choice, it could help increase accuracy.

upvoted 0 times

...

3 months ago

I think we should add more data to the training set.

upvoted 0 times

...