New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon MLS-C01 Exam - Topic 2 Question 61 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 61
Topic #: 2
[All MLS-C01 Questions]

A data scientist is using the Amazon SageMaker Neural Topic Model (NTM) algorithm to build a model that recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON format. During model evaluation, the data scientist discovered that the model recommends certain stopwords such as "a," "an,'' and "the" as tags to certain blog posts, along with a few rare words that are present only in certain blog entries. After a few iterations of tag review with the content team, the data scientist notices that the rare words are unusual but feasible. The data scientist also must ensure that the tag recommendations of the generated model do not include the stopwords.

What should the data scientist do to meet these requirements?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Marti
4 months ago
Definitely need to remove those stopwords, D is the way to go!
upvoted 0 times
...
Kati
4 months ago
Rare words as tags? That's surprising!
upvoted 0 times
...
Dalene
4 months ago
Wait, why would you use PCA for this? B doesn't make sense.
upvoted 0 times
...
Lezlie
4 months ago
I disagree, A could work too if you want to use entity recognition.
upvoted 0 times
...
Ena
4 months ago
Sounds like D is the best option to handle those stopwords!
upvoted 0 times
...
Frederic
5 months ago
I don't think PCA is relevant here since we're dealing with text data, so option B seems off. I’d lean towards option D for removing stopwords.
upvoted 0 times
...
Latanya
5 months ago
I practiced a similar question where we had to clean text data. I feel like the Count Vectorizer is a solid choice, but I wonder if there are other methods we didn't cover.
upvoted 0 times
...
Mary
5 months ago
I'm not entirely sure, but I think using Amazon Comprehend could help with entity recognition. Maybe option A could work too?
upvoted 0 times
...
Lucy
5 months ago
I remember we discussed the importance of preprocessing text data, especially removing stopwords, so option D seems like the right approach.
upvoted 0 times
...
Flo
5 months ago
This looks like a straightforward cloud computing question. I'll review the key characteristics and eliminate the ones that don't fit.
upvoted 0 times
...
Cammy
5 months ago
I'm pretty sure this has to do with the CYP2C9 enzyme, since warfarin is primarily metabolized by that pathway.
upvoted 0 times
...

Save Cancel