New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon MLS-C01 Exam - Topic 4 Question 51 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 51
Topic #: 4
[All MLS-C01 Questions]

A global financial company is using machine learning to automate its loan approval process. The company has a dataset of customer information. The dataset contains some categorical fields, such as customer location by city and housing status. The dataset also includes financial fields in different units, such as account balances in US dollars and monthly interest in US cents.

The company's data scientists are using a gradient boosting regression model to infer the credit score for each customer. The model has a training accuracy of 99% and a testing accuracy of 75%. The data scientists want to improve the model's testing accuracy.

Which process will improve the testing accuracy the MOST?

Show Suggested Answer Hide Answer
Suggested Answer: B

Contribute your Thoughts:

0/2000 characters
Clemencia
4 months ago
Binning financial fields might not be the best approach here.
upvoted 0 times
...
Kimberlie
4 months ago
Not sure if tokenization is the right move for categorical data.
upvoted 0 times
...
Terrilyn
4 months ago
Wait, 99% training accuracy but only 75% testing? That’s kinda sketchy!
upvoted 0 times
...
Dalene
4 months ago
I think L1 regularization could really help with overfitting.
upvoted 0 times
...
Noemi
5 months ago
Sounds like one-hot encoding is a must for those categorical fields!
upvoted 0 times
...
Tiera
5 months ago
I feel pretty confident about this. The Scrum framework provides boundaries like timeboxing and clear team responsibilities that enable teams to self-manage effectively.
upvoted 0 times
...
Rutha
5 months ago
Alright, time to put my knowledge to the test. I'm pretty sure an "Independent" value set doesn't have a predefined list of values, so I'll go with option D. Now I just need to find one more.
upvoted 0 times
...
yak22
4 years ago
The correct answer is 'A' because first it allows to encode categorical data with one hot encoding, second the input variables (dollar amounts) are in different units they need to on same scale for that 'Standardization' method will be used. Third based on train & test accuracy the model is overfitted so it should be regularized (constrained) using L1.
upvoted 1 times
...

Save Cancel