CompTIA DY0-001 Exam - Topic 5 Question 4 Discussion

Actual exam question for CompTIA's DY0-001 exam

Question #: 4
Topic #: 5

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

AInterpolated data

BExtrapolated data

CIn-sample data

DOut-of-sample data

Show Suggested Answer

Suggested Answer: D

The aggregated feedback covers only 80% of respondents, mostly from a few professions and locations, so the model hasn't ''seen'' the remaining 20% (and those underrepresented groups). Its performance on those unseen subsets (out-of-sample data) is therefore the primary concern for how well it will predict the actual vote.

by Ruth at Jun 11, 2025, 06:48 PM

Limited Time Offer

25%

Off

Get Premium DY0-001 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Alberto

7 months ago

Surprised nobody mentioned the impact of location on the results!

upvoted 0 times

...

Leatha

7 months ago

I think extrapolated data might be the real concern.

upvoted 0 times

...

Jodi

7 months ago

Wait, why would out-of-sample data matter in this case?

upvoted 0 times

...

Arlette

7 months ago

Totally agree, in-sample data can be misleading!

upvoted 0 times

...

Laine

8 months ago

Looks like in-sample data could be a big issue here.

upvoted 0 times

...

Dean

8 months ago

I keep mixing up interpolated and extrapolated data. I guess I need to think about how the data was collected and if it represents the whole population.

upvoted 0 times

...

Judy

8 months ago

This question reminds me of a practice problem we did about model validation. I feel like out-of-sample data is usually a big concern for generalization.

upvoted 0 times

...

Nan

8 months ago

I think extrapolated data could be a problem if we're trying to predict beyond the range of our data, but I'm not entirely confident.

upvoted 0 times

...

Layla

9 months ago

I remember discussing how in-sample data can sometimes lead to overfitting, but I'm not sure if that's the main concern here.

upvoted 0 times

...

Francene

9 months ago

This is a good one. The fact that the data set only represents 80% of the full data set makes me wonder if the concern could be about the missing 20% and how that might impact the model's predictions.

upvoted 0 times

...

Bette

9 months ago

Okay, let's see. The data set represents feedback from 17 professions and 12 locations, and the question is asking about the most likely concern. I'm thinking it might be about extrapolating beyond the data we have.

upvoted 0 times

...

Idella

9 months ago

The question is asking about the model's ability to predict the outcome, so I'm guessing the concern is related to the data used to train the model. Maybe something about the representativeness of the data?

upvoted 0 times

...

Glory

9 months ago

Hmm, this looks like a tricky one. I'll need to think carefully about the data and the question to figure out the most likely concern.

upvoted 0 times

...

Malcom

12 months ago

Ha! This is like a real-life version of the age-old debate: tigers vs. lions. I bet the data scientists are having a field day with this one.

upvoted 0 times

Tamra

10 months ago

B: Yeah, that makes sense. It's always tricky when you're predicting something based on data outside of what you already have.

upvoted 0 times

...

Benton

11 months ago

A: I think the concern might be extrapolated data.

upvoted 0 times

...

Tyisha

1 year ago

But what about in-sample data? Could that also be a concern for the model's prediction?

upvoted 0 times

...

Kaitlyn

1 year ago

I agree with Alease, using data outside the range may not accurately predict the outcome.

upvoted 0 times

...

Janine

1 year ago

I'm not sure, but I'd be worried about the potential for bias in the data. Tigers and lions are both pretty exciting mascots, but I wonder if certain regions or professions might have a preference.

upvoted 0 times

Alba

1 year ago

A: That's a good point. They might need to consider collecting more data from a wider range of sources to improve the model's accuracy.

upvoted 0 times

...

Talia

1 year ago

B: Yeah, it could be biased towards those specific groups. Maybe they should try to get more diverse data.

upvoted 0 times

...

Lashawnda

1 year ago

A: I think the concern might be that the model is only based on data from certain professions and locations.

upvoted 0 times

...

Shizue

1 year ago

Out-of-sample data seems like the most likely issue. The model is trained on only 80% of the data, so it might not accurately reflect the full population.

upvoted 0 times

Luisa

12 months ago

Yes, the model might not generalize well to the entire population with only 80% of the data.

upvoted 0 times

...

Lazaro

1 year ago

I agree, out-of-sample data could lead to inaccurate predictions.

upvoted 0 times

...

Alease

1 year ago

I think the concern could be extrapolated data.

upvoted 0 times

...

Leana

1 year ago

You know, I bet the model would be a lot more accurate if they just had a vote-off between a tiger and a lion mascot. That would give us the true pulse of the nation!

upvoted 0 times