CompTIA CY0-001 Exam - Topic 2 Question 1 Discussion

Actual exam question for CompTIA's CY0-001 exam

Question #: 1
Topic #: 2

During the selection of a machine learning (ML)-based threat classification model, a cybersecurity administrator verifies that label distribution is highly unbalanced.

Which of the following processing techniques should the engineer use to balance the model?

AData lineage

BData augmentation

CData provenance

DData verification

Show Suggested Answer

Suggested Answer: B

Basic Concept: Class imbalance in training data --- where some categories have significantly more examples than others --- causes ML models to be biased toward the majority class, producing poor detection of minority class threats. Addressing this imbalance before training is critical for threat classification accuracy. CompTIA SecAI+ covers data preparation techniques under basic AI concepts.

Why B is Correct: Data augmentation addresses class imbalance by artificially increasing the number of training samples in under-represented classes. Techniques include oversampling minority classes by creating synthetic examples using methods like SMOTE (Synthetic Minority Over-sampling Technique), or undersampling majority classes. This balances label distribution and enables the model to learn decision boundaries that accurately classify all threat categories, not just the dominant ones.

Why A is Wrong: Data lineage documents the origin, movement, and transformation of data throughout its lifecycle. It provides traceability and auditability but does not address class imbalance in training data distribution.

Why C is Wrong: Data provenance records the history and context of data origins. Like lineage, it is a governance and tracking concept that does not alter data distribution for model training balance.

Why D is Wrong: Data verification confirms that data is correct and consistent with expected formats and values. It checks data quality and integrity but does not address the statistical distribution imbalance between threat classes in training datasets.

by Valentin at May 30, 2026, 06:02 PM

Limited Time Offer

25%

Off

Get Premium CY0-001 Questions as Interactive Web-Based Practice Test or PDF