In the context of evaluating a fine-tuned LLM for a text classification task, which experimental design technique ensures robust performance estimation when dealing with imbalanced datasets?
Stratified k-fold cross-validation is a robust experimental design technique for evaluating machine learning models, especially on imbalanced datasets. It divides the dataset into k folds while preserving the class distribution in each fold, ensuring that the model is evaluated on representative samples of all classes. NVIDIA's NeMo documentation on model evaluation recommends stratified cross-validation for tasks like text classification to obtain reliable performance estimates, particularly when classes are unevenly distributed (e.g., in sentiment analysis with few negative samples). Option A (single hold-out) is less robust, as it may not capture class imbalance. Option C (bootstrapping) introduces variability and is less suitable for imbalanced data. Option D (grid search) is for hyperparameter tuning, not performance estimation.
NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/model_finetuning.html
Laurel
1 day agoTroy
7 days agoDetra
12 days agoElfriede
17 days agoLeila
1 month agoJohnna
1 month agoElise
2 months agoDanica
2 months agoKimbery
2 months agoColton
2 months agoCherilyn
2 months agoGearldine
2 months agoWhitley
3 months agoDacia
3 months agoGlenna
3 months agoJoseph
3 months agoRocco
3 months agoTerrilyn
3 months agoAlease
4 months agoZita
4 months agoBuddy
4 months agoFrederica
4 months agoMalissa
4 months ago