Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

NVIDIA NCA-GENL Exam - Topic 5 Question 18 Discussion

Actual exam question for NVIDIA's NCA-GENL exam
Question #: 18
Topic #: 5
[All NCA-GENL Questions]

When designing an experiment to compare the performance of two LLMs on a question-answering task, which statistical test is most appropriate to determine if the difference in their accuracy is significant, assuming the data follows a normal distribution?

Show Suggested Answer Hide Answer
Suggested Answer: B

The paired t-test is the most appropriate statistical test to compare the performance (e.g., accuracy) of two large language models (LLMs) on the same question-answering dataset, assuming the data follows a normal distribution. This test evaluates whether the mean difference in paired observations (e.g., accuracy on each question) is statistically significant. NVIDIA's documentation on model evaluation in NeMo suggests using paired statistical tests for comparing model performance on identical datasets to account for correlated errors. Option A (Chi-squared test) is for categorical data, not continuous metrics like accuracy. Option C (Mann-Whitney U test) is non-parametric and used for non-normal data. Option D (ANOVA) is for comparing more than two groups, not two models.


NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/model_finetuning.html

Contribute your Thoughts:

0/2000 characters
Meaghan
13 days ago
I feel like the Mann-Whitney U test is for non-parametric data, so it probably isn't the best choice here since we assume normal distribution.
upvoted 0 times
...
Kattie
18 days ago
I'm not entirely sure, but I remember something about ANOVA being used for comparing more than two groups, so that might not be it.
upvoted 0 times
...
Justine
23 days ago
I think we might need to use the paired t-test since we're comparing the same type of task for both LLMs, right?
upvoted 0 times
...

Save Cancel