New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon AIF-C01 Exam - Topic 4 Question 1 Discussion

Actual exam question for Amazon's AIF-C01 exam
Question #: 1
Topic #: 4
[All AIF-C01 Questions]

A company has built a solution by using generative AI. The solution uses large language models (LLMs) to translate training manuals from English into other languages. The company wants to evaluate the accuracy of the solution by examining the text generated for the manuals.

Which model evaluation strategy meets these requirements?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Krissy
3 months ago
F1 score seems off for this kind of evaluation.
upvoted 0 times
...
Tori
3 months ago
Wait, can LLMs really translate accurately?
upvoted 0 times
...
Yoko
3 months ago
RMSE? Not really suited for text evaluation.
upvoted 0 times
...
Tambra
4 months ago
I think ROUGE might be better for this.
upvoted 0 times
...
Rodrigo
4 months ago
BLEU is the go-to for translation tasks!
upvoted 0 times
...
Alysa
4 months ago
I thought RMSE was more for regression tasks, so it doesn't seem like it fits this scenario at all.
upvoted 0 times
...
Kyoko
4 months ago
I practiced a similar question where BLEU was the correct answer for translation accuracy, but I’m still a bit confused about the other options.
upvoted 0 times
...
Kerry
4 months ago
I'm not entirely sure, but I feel like ROUGE is more for summarization tasks rather than translation.
upvoted 0 times
...
Tora
5 months ago
I remember we discussed BLEU scores in class as a way to evaluate translation quality, so I think that might be the right choice here.
upvoted 0 times
...
Fatima
5 months ago
RMSE might work, but it's more commonly used for regression tasks. For this translation problem, I think BLEU or ROUGE would be the better options. I'll have to review the differences between those two metrics to decide.
upvoted 0 times
...
Rosendo
5 months ago
BLEU is definitely the way to go here. It's specifically designed to evaluate the accuracy of machine translation, which is exactly what this company needs to do. I feel confident in this answer.
upvoted 0 times
...
Ligia
5 months ago
This seems like a straightforward translation evaluation problem, so I'd go with BLEU. It's a well-established metric for assessing the quality of machine translation output.
upvoted 0 times
...
Hershel
5 months ago
I'm a bit unsure here. BLEU seems like the obvious choice, but I'm wondering if ROUGE might be a better fit since it's more focused on evaluating text summarization. Hmm, I'll have to think this through a bit more.
upvoted 0 times
...
Wilda
5 months ago
This seems like a straightforward question about communication channels. I'll think through the pros and cons of each option.
upvoted 0 times
...
Alease
1 year ago
RMSE? Really? That's more for measuring numerical accuracy, not text quality. I don't think that's what the company is looking for here.
upvoted 0 times
...
Colton
1 year ago
I think F1 score could also be useful in evaluating the accuracy of the solution, as it considers both precision and recall.
upvoted 0 times
...
Ma
1 year ago
I'm not convinced BLEU is the best option. Shouldn't we also consider ROUGE, which is better for evaluating text summarization? Hmm, decisions, decisions.
upvoted 0 times
Bernardine
1 year ago
Good idea! Using both evaluation strategies will give us a more well-rounded assessment of the solution's accuracy.
upvoted 0 times
...
Amos
1 year ago
That's true, BLEU does focus on translation accuracy. Maybe we can use both BLEU and ROUGE for a comprehensive evaluation.
upvoted 0 times
...
Edda
1 year ago
But BLEU is specifically designed for translation tasks, so it might be more appropriate in this case.
upvoted 0 times
...
Kirby
1 year ago
I think we should consider ROUGE as well, it's better for text summarization.
upvoted 0 times
...
...
Marvel
1 year ago
I'm not sure, but I think C) Recall-Oriented Understudy for Gisting Evaluation (ROUGE) could also be a good option for evaluating text generation.
upvoted 0 times
...
Margurite
1 year ago
BLEU seems like the obvious choice here. It's designed specifically for evaluating machine translation, which is exactly what this company is trying to do.
upvoted 0 times
Xochitl
1 year ago
Yes, BLEU is widely used in the field for assessing the quality of generated text.
upvoted 0 times
...
Leontine
1 year ago
I agree, BLEU is the best choice for evaluating machine translation.
upvoted 0 times
...
...
Ashley
1 year ago
I agree with Dierdre, BLEU is commonly used for evaluating machine translation.
upvoted 0 times
...
Dierdre
1 year ago
I think the best model evaluation strategy for this scenario is A) Bilingual Evaluation Understudy (BLEU).
upvoted 0 times
...

Save Cancel