New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon MLS-C01 Exam - Topic 4 Question 80 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 80
Topic #: 4
[All MLS-C01 Questions]

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset

Which tool should be used to improve the validation accuracy?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Mireya
3 months ago
I’d go with Scikit-learn for sure, it’s reliable for text data!
upvoted 0 times
...
Malcolm
3 months ago
NLTK is great for preprocessing, but might not solve the accuracy problem.
upvoted 0 times
...
Colene
4 months ago
Wait, can TF-IDF really help with rich vocab issues?
upvoted 0 times
...
Thurman
4 months ago
Totally agree, BlazingText is powerful for text classification!
upvoted 0 times
...
Felicidad
4 months ago
I think option B is the best choice for improving accuracy.
upvoted 0 times
...
Arlette
4 months ago
I think Amazon Comprehend is more for entity recognition, not really for improving validation accuracy directly.
upvoted 0 times
...
Juan
4 months ago
I feel like we practiced a similar question where using BlazingText improved accuracy. Could that be the right choice here?
upvoted 0 times
...
Brianne
5 months ago
I'm not entirely sure, but I think stemming and stop word removal might help reduce the complexity of the dataset.
upvoted 0 times
...
Eun
5 months ago
I remember we discussed how rich vocabulary can lead to overfitting, so maybe using TF-IDF could help with that.
upvoted 0 times
...
Dahlia
5 months ago
I'm a bit confused on the details of the different options. Maybe I'll start by reviewing the key features of each tool and see which one seems most tailored to address the specific challenges mentioned in the question.
upvoted 0 times
...
Dudley
5 months ago
Okay, I've seen this type of problem before. I'm pretty confident the Amazon SageMaker BlazingText tool is designed to handle text data with a diverse vocabulary, so that seems like the best choice here.
upvoted 0 times
...
Peggie
5 months ago
I'm a bit unsure on this one. The rich vocabulary and low word frequency makes me think the TF-IDF vectorizers from Scikit-learn could be a useful tool to try. But I'll need to double-check the details on how that works.
upvoted 0 times
...
Susy
5 months ago
Hmm, this seems like a tricky one. I'm thinking the NLTK toolkit with stemming and stop word removal might be a good option to try and simplify the vocabulary and improve the validation accuracy.
upvoted 0 times
...
Nieves
5 months ago
This is a tricky one. I'm leaning towards C, but I'm not 100% sure. I'll make my best guess and move on to the next question.
upvoted 0 times
...
Lauran
5 months ago
I'm a bit unsure about the manual summarization option. I believe it helps with routing tables, but could that really be critical in a Data Center setup?
upvoted 0 times
...
Art
5 months ago
I think corrosion under insulation is most severe between 175 F and 212 F. I remember reading about this in my materials class.
upvoted 0 times
...
Alise
5 months ago
I think an active intermediary can definitely compromise message confidentiality since they can read the message content, but I'm not sure about integrity.
upvoted 0 times
...
Chaya
10 months ago
Stemming and stop word removal? Sounds like my high school English teacher's dream tool. Maybe we can throw in some thesaurus action too, just for fun.
upvoted 0 times
Yvette
8 months ago
C: Good idea! We can combine both approaches to improve the validation accuracy.
upvoted 0 times
...
Ira
8 months ago
B: That could work, but I also think we should consider using Natural Language Toolkit (NLTK) stemming and stop word removal.
upvoted 0 times
...
Linn
8 months ago
A: I think we should try using Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers.
upvoted 0 times
...
...
Vallie
10 months ago
B) Amazon SageMaker BlazingText? Sounds like a made-up answer. I'd stick to the more well-known NLP tools and techniques.
upvoted 0 times
Novella
9 months ago
C: Natural Language Toolkit (NLTK) stemming and stop word removal could also be a good option to try.
upvoted 0 times
...
Shelia
9 months ago
B: Amazon SageMaker BlazingText? Sounds like a made-up answer. I'd stick to the more well-known NLP tools and techniques.
upvoted 0 times
...
Josphine
9 months ago
A: I think using Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers could help improve the validation accuracy.
upvoted 0 times
...
...
Sharika
10 months ago
A) Amazon Comprehend is probably overkill for this task. It's more suited for enterprise-level NLP tasks, not a simple sentiment analysis problem.
upvoted 0 times
Virgie
9 months ago
A) Amazon Comprehend is probably overkill for this task.
upvoted 0 times
...
Nicolette
9 months ago
D) Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers
upvoted 0 times
...
Tyra
9 months ago
C) Natural Language Toolkit (NLTK) stemming and stop word removal
upvoted 0 times
...
...
Domitila
10 months ago
D) Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers could also be a good choice. TF-IDF can help identify the most important words in the dataset and reduce the impact of common words.
upvoted 0 times
Essie
9 months ago
A: TF-IDF can help identify the most important words and reduce the impact of common words.
upvoted 0 times
...
Son
9 months ago
C: D) Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers could also be a good choice.
upvoted 0 times
...
Coral
10 months ago
B: I agree, removing stop words can help focus on the more important words in the dataset.
upvoted 0 times
...
Joanne
10 months ago
A: I think we should use C) Natural Language Toolkit (NLTK) stemming and stop word removal to improve the validation accuracy.
upvoted 0 times
...
...
Jamal
10 months ago
I prefer using NLTK for stemming and stop word removal to improve accuracy.
upvoted 0 times
...
Chauncey
10 months ago
C) Natural Language Toolkit (NLTK) stemming and stop word removal seems like the best option to handle the issue of rich vocabulary and low average word frequency. Removing common words and reducing words to their base form can help improve the model's performance.
upvoted 0 times
Gianna
9 months ago
I agree, it can definitely help in handling the rich vocabulary and low word frequency.
upvoted 0 times
...
Delisa
9 months ago
I think using NLTK stemming and stop word removal could really help improve the accuracy.
upvoted 0 times
...
...
Kris
11 months ago
I agree with Florinda, TF-IDF can help with the low frequency of words issue.
upvoted 0 times
...
Florinda
11 months ago
I think we should use Scikit-learn TF-IDF vectorizers.
upvoted 0 times
...

Save Cancel