Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

CompTIA CY0-001 Exam - Topic 2 Question 1 Discussion

Actual exam question for CompTIA's CY0-001 exam
Question #: 1
Topic #: 2
[All CY0-001 Questions]

During the selection of a machine learning (ML)-based threat classification model, a cybersecurity administrator verifies that label distribution is highly unbalanced.

Which of the following processing techniques should the engineer use to balance the model?

Show Suggested Answer Hide Answer
Suggested Answer: B

Basic Concept: Class imbalance in training data --- where some categories have significantly more examples than others --- causes ML models to be biased toward the majority class, producing poor detection of minority class threats. Addressing this imbalance before training is critical for threat classification accuracy. CompTIA SecAI+ covers data preparation techniques under basic AI concepts.

Why B is Correct: Data augmentation addresses class imbalance by artificially increasing the number of training samples in under-represented classes. Techniques include oversampling minority classes by creating synthetic examples using methods like SMOTE (Synthetic Minority Over-sampling Technique), or undersampling majority classes. This balances label distribution and enables the model to learn decision boundaries that accurately classify all threat categories, not just the dominant ones.

Why A is Wrong: Data lineage documents the origin, movement, and transformation of data throughout its lifecycle. It provides traceability and auditability but does not address class imbalance in training data distribution.

Why C is Wrong: Data provenance records the history and context of data origins. Like lineage, it is a governance and tracking concept that does not alter data distribution for model training balance.

Why D is Wrong: Data verification confirms that data is correct and consistent with expected formats and values. It checks data quality and integrity but does not address the statistical distribution imbalance between threat classes in training datasets.


Contribute your Thoughts:

0/2000 characters

Currently there are no comments in this discussion, be the first to comment!


Save Cancel