NVIDIA Exam NCA-GENL Topic 7 Question 4 Discussion

Actual exam question for NVIDIA's NCA-GENL exam

Question #: 4
Topic #: 7

[All NCA-GENL Questions]

[Experimentation]

What distinguishes BLEU scores from ROUGE scores when evaluating natural language processing models?

ABLEU scores determine the fluency of text generation, while ROUGE scores rate the uniqueness of generated text.

BBLEU scores analyze syntactic structures, while ROUGE scores evaluate semantic accuracy.

CBLEU scores evaluate the 'precision' of translations, while ROUGE scores focus on the 'recall' of summarized text.

DBLEU scores measure model efficiency, whereas ROUGE scores assess computational complexity.

Show Suggested Answer

Suggested Answer: C

BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are metrics used to evaluate natural language processing (NLP) models, particularly for tasks like machine translation and text summarization. According to NVIDIA's NeMo documentation on NLP evaluation metrics, BLEU primarily measures the precision of n-gram overlaps between generated and reference translations, making it suitable for assessing translation quality. ROUGE, on the other hand, focuses on recall, measuring the overlap of n-grams, longest common subsequences, or skip-bigrams between generated and reference summaries, making it ideal for summarization tasks. Option A is incorrect, as BLEU and ROUGE do not measure fluency or uniqueness directly. Option B is wrong, as both metrics focus on n-gram overlap, not syntactic or semantic analysis. Option D is false, as neither metric evaluates efficiency or complexity.

NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/intro.html

Papineni, K., et al. (2002). 'BLEU: A Method for Automatic Evaluation of Machine Translation.'

Lin, C.-Y. (2004). 'ROUGE: A Package for Automatic Evaluation of Summaries.'

by Lashanda at Jun 14, 2025, 07:49 PM

Limited Time Offer

25%

Off

Get Premium NCA-GENL Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

26 days ago

This is a tough one, but I'm going to have to go with C. Translating text and summarizing text are two very different tasks, so it makes sense that the evaluation metrics would focus on different aspects.

upvoted 0 times

Fatima

9 days ago

That's right. BLEU scores are based on n-gram precision and recall, while ROUGE scores are based on overlap of n-grams in the generated summary and reference summary.

upvoted 0 times

...

Tuyet

10 days ago

I agree, BLEU scores are more focused on translation accuracy while ROUGE scores are more focused on summarization quality.

upvoted 0 times

...

Jina

1 months ago

Hmm, I'm torn between B and C. But I think I'll go with C. The precision vs. recall distinction seems like the clearest way to differentiate BLEU and ROUGE scores.

upvoted 0 times

...

1 months ago

I think the correct answer is C. BLEU scores evaluate the precision of translations, while ROUGE scores focus on the recall of summarized text. This makes sense to me based on my understanding of these evaluation metrics.

upvoted 0 times