New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

CompTIA DY0-001 Exam - Topic 5 Question 4 Discussion

Actual exam question for CompTIA's DY0-001 exam
Question #: 4
Topic #: 5
[All DY0-001 Questions]

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

Show Suggested Answer Hide Answer
Suggested Answer: D

The aggregated feedback covers only 80% of respondents, mostly from a few professions and locations, so the model hasn't ''seen'' the remaining 20% (and those underrepresented groups). Its performance on those unseen subsets (out-of-sample data) is therefore the primary concern for how well it will predict the actual vote.


Contribute your Thoughts:

0/2000 characters
Alberto
2 months ago
Surprised nobody mentioned the impact of location on the results!
upvoted 0 times
...
Leatha
2 months ago
I think extrapolated data might be the real concern.
upvoted 0 times
...
Jodi
3 months ago
Wait, why would out-of-sample data matter in this case?
upvoted 0 times
...
Arlette
3 months ago
Totally agree, in-sample data can be misleading!
upvoted 0 times
...
Laine
3 months ago
Looks like in-sample data could be a big issue here.
upvoted 0 times
...
Dean
3 months ago
I keep mixing up interpolated and extrapolated data. I guess I need to think about how the data was collected and if it represents the whole population.
upvoted 0 times
...
Judy
4 months ago
This question reminds me of a practice problem we did about model validation. I feel like out-of-sample data is usually a big concern for generalization.
upvoted 0 times
...
Nan
4 months ago
I think extrapolated data could be a problem if we're trying to predict beyond the range of our data, but I'm not entirely confident.
upvoted 0 times
...
Layla
4 months ago
I remember discussing how in-sample data can sometimes lead to overfitting, but I'm not sure if that's the main concern here.
upvoted 0 times
...
Francene
4 months ago
This is a good one. The fact that the data set only represents 80% of the full data set makes me wonder if the concern could be about the missing 20% and how that might impact the model's predictions.
upvoted 0 times
...
Bette
4 months ago
Okay, let's see. The data set represents feedback from 17 professions and 12 locations, and the question is asking about the most likely concern. I'm thinking it might be about extrapolating beyond the data we have.
upvoted 0 times
...
Idella
5 months ago
The question is asking about the model's ability to predict the outcome, so I'm guessing the concern is related to the data used to train the model. Maybe something about the representativeness of the data?
upvoted 0 times
...
Glory
5 months ago
Hmm, this looks like a tricky one. I'll need to think carefully about the data and the question to figure out the most likely concern.
upvoted 0 times
...
Malcom
7 months ago
Ha! This is like a real-life version of the age-old debate: tigers vs. lions. I bet the data scientists are having a field day with this one.
upvoted 0 times
Tamra
5 months ago
B: Yeah, that makes sense. It's always tricky when you're predicting something based on data outside of what you already have.
upvoted 0 times
...
Benton
7 months ago
A: I think the concern might be extrapolated data.
upvoted 0 times
...
...
Tyisha
8 months ago
But what about in-sample data? Could that also be a concern for the model's prediction?
upvoted 0 times
...
Kaitlyn
8 months ago
I agree with Alease, using data outside the range may not accurately predict the outcome.
upvoted 0 times
...
Janine
8 months ago
I'm not sure, but I'd be worried about the potential for bias in the data. Tigers and lions are both pretty exciting mascots, but I wonder if certain regions or professions might have a preference.
upvoted 0 times
Alba
8 months ago
A: That's a good point. They might need to consider collecting more data from a wider range of sources to improve the model's accuracy.
upvoted 0 times
...
Talia
8 months ago
B: Yeah, it could be biased towards those specific groups. Maybe they should try to get more diverse data.
upvoted 0 times
...
Lashawnda
8 months ago
A: I think the concern might be that the model is only based on data from certain professions and locations.
upvoted 0 times
...
...
Shizue
8 months ago
Out-of-sample data seems like the most likely issue. The model is trained on only 80% of the data, so it might not accurately reflect the full population.
upvoted 0 times
Luisa
7 months ago
Yes, the model might not generalize well to the entire population with only 80% of the data.
upvoted 0 times
...
Lazaro
8 months ago
I agree, out-of-sample data could lead to inaccurate predictions.
upvoted 0 times
...
...
Alease
8 months ago
I think the concern could be extrapolated data.
upvoted 0 times
...
Leana
8 months ago
You know, I bet the model would be a lot more accurate if they just had a vote-off between a tiger and a lion mascot. That would give us the true pulse of the nation!
upvoted 0 times
Chantell
7 months ago
D: I agree, out-of-sample data could introduce bias into the model.
upvoted 0 times
...
Lou
7 months ago
C: Maybe they should stick to in-sample data for a more reliable prediction.
upvoted 0 times
...
Cordelia
7 months ago
B: Yeah, using data outside of what was collected could affect the model's accuracy.
upvoted 0 times
...
Maurine
8 months ago
A: I think the concern might be extrapolated data.
upvoted 0 times
...
...

Save Cancel