NVIDIA NCA-GENL Exam - Topic 2 Question 15 Discussion

Actual exam question for NVIDIA's NCA-GENL exam

Question #: 15
Topic #: 2

In the context of fine-tuning LLMs, which of the following metrics is most commonly used to assess the performance of a fine-tuned model?

AModel size

BAccuracy on a validation set

CTraining duration

DNumber of layers

Show Suggested Answer

Suggested Answer: B

When fine-tuning large language models (LLMs), the primary goal is to improve the model's performance on a specific task. The most common metric for assessing this performance is accuracy on a validation set, as it directly measures how well the model generalizes to unseen data. NVIDIA's NeMo framework documentation for fine-tuning LLMs emphasizes the use of validation metrics such as accuracy, F1 score, or task-specific metrics (e.g., BLEU for translation) to evaluate model performance during and after fine-tuning. These metrics provide a quantitative measure of the model's effectiveness on the target task. Options A, C, and D (model size, training duration, and number of layers) are not performance metrics; they are either architectural characteristics or training parameters that do not directly reflect the model's effectiveness.

NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/model_finetuning.html

by Salena at Mar 05, 2026, 03:18 AM

Limited Time Offer

25%

2 months ago

I think accuracy on a validation set is the most common metric, but I'm not entirely sure.

upvoted 0 times

...