Amazon AIF-C01 Exam - Topic 4 Question 1 Discussion

Actual exam question for Amazon's AIF-C01 exam

Question #: 1
Topic #: 4

A company has built a solution by using generative AI. The solution uses large language models (LLMs) to translate training manuals from English into other languages. The company wants to evaluate the accuracy of the solution by examining the text generated for the manuals.

Which model evaluation strategy meets these requirements?

ABilingual Evaluation Understudy (BLEU)

BRoot mean squared error (RMSE)

CRecall-Oriented Understudy for Gisting Evaluation (ROUGE)

DF1 score

Show Suggested Answer

Suggested Answer: A

by Coleen at Sep 11, 2024, 01:37 AM

Limited Time Offer

25%

Off

Get Premium AIF-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Krissy

5 months ago

F1 score seems off for this kind of evaluation.

upvoted 0 times

...

Tori

5 months ago

Wait, can LLMs really translate accurately?

upvoted 0 times

...

Yoko

5 months ago

RMSE? Not really suited for text evaluation.

upvoted 0 times

...

Tambra

5 months ago

I think ROUGE might be better for this.

upvoted 0 times

...

Rodrigo

6 months ago

BLEU is the go-to for translation tasks!

upvoted 0 times

...

Alysa

6 months ago

I thought RMSE was more for regression tasks, so it doesn't seem like it fits this scenario at all.

upvoted 0 times

...

Kyoko

6 months ago

I practiced a similar question where BLEU was the correct answer for translation accuracy, but I’m still a bit confused about the other options.

upvoted 0 times

...

Kerry

6 months ago

I'm not entirely sure, but I feel like ROUGE is more for summarization tasks rather than translation.

upvoted 0 times

...

Tora

6 months ago

I remember we discussed BLEU scores in class as a way to evaluate translation quality, so I think that might be the right choice here.

upvoted 0 times

...

Fatima

6 months ago

RMSE might work, but it's more commonly used for regression tasks. For this translation problem, I think BLEU or ROUGE would be the better options. I'll have to review the differences between those two metrics to decide.

upvoted 0 times

...

Rosendo

6 months ago

BLEU is definitely the way to go here. It's specifically designed to evaluate the accuracy of machine translation, which is exactly what this company needs to do. I feel confident in this answer.

upvoted 0 times

...

Ligia

6 months ago

This seems like a straightforward translation evaluation problem, so I'd go with BLEU. It's a well-established metric for assessing the quality of machine translation output.

upvoted 0 times

...

Hershel

7 months ago

I'm a bit unsure here. BLEU seems like the obvious choice, but I'm wondering if ROUGE might be a better fit since it's more focused on evaluating text summarization. Hmm, I'll have to think this through a bit more.

upvoted 0 times

...

Wilda

7 months ago

This seems like a straightforward question about communication channels. I'll think through the pros and cons of each option.

upvoted 0 times

...

Alease

2 years ago

RMSE? Really? That's more for measuring numerical accuracy, not text quality. I don't think that's what the company is looking for here.

upvoted 0 times

...

Colton

2 years ago

I think F1 score could also be useful in evaluating the accuracy of the solution, as it considers both precision and recall.

upvoted 0 times

...

Ma

2 years ago

I'm not convinced BLEU is the best option. Shouldn't we also consider ROUGE, which is better for evaluating text summarization? Hmm, decisions, decisions.

upvoted 0 times

Bernardine

2 years ago

Good idea! Using both evaluation strategies will give us a more well-rounded assessment of the solution's accuracy.

upvoted 0 times

...

Amos

2 years ago

That's true, BLEU does focus on translation accuracy. Maybe we can use both BLEU and ROUGE for a comprehensive evaluation.

upvoted 0 times

...

Edda

2 years ago

But BLEU is specifically designed for translation tasks, so it might be more appropriate in this case.

upvoted 0 times

...

Kirby

2 years ago

I think we should consider ROUGE as well, it's better for text summarization.

upvoted 0 times

...

Marvel

2 years ago

I'm not sure, but I think C) Recall-Oriented Understudy for Gisting Evaluation (ROUGE) could also be a good option for evaluating text generation.

upvoted 0 times

...

Margurite

2 years ago

BLEU seems like the obvious choice here. It's designed specifically for evaluating machine translation, which is exactly what this company is trying to do.

upvoted 0 times