Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

CompTIA DY0-001 Exam - Topic 5 Question 4 Discussion

Actual exam question for CompTIA's DY0-001 exam
Question #: 4
Topic #: 5
[All DY0-001 Questions]

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

Show Suggested Answer Hide Answer
Suggested Answer: D

The aggregated feedback covers only 80% of respondents, mostly from a few professions and locations, so the model hasn't ''seen'' the remaining 20% (and those underrepresented groups). Its performance on those unseen subsets (out-of-sample data) is therefore the primary concern for how well it will predict the actual vote.


Contribute your Thoughts:

0/2000 characters
Alberto
5 months ago
Surprised nobody mentioned the impact of location on the results!
upvoted 0 times
...
Leatha
5 months ago
I think extrapolated data might be the real concern.
upvoted 0 times
...
Jodi
6 months ago
Wait, why would out-of-sample data matter in this case?
upvoted 0 times
...
Arlette
6 months ago
Totally agree, in-sample data can be misleading!
upvoted 0 times
...
Laine
6 months ago
Looks like in-sample data could be a big issue here.
upvoted 0 times
...
Dean
6 months ago
I keep mixing up interpolated and extrapolated data. I guess I need to think about how the data was collected and if it represents the whole population.
upvoted 0 times
...
Judy
7 months ago
This question reminds me of a practice problem we did about model validation. I feel like out-of-sample data is usually a big concern for generalization.
upvoted 0 times
...
Nan
7 months ago
I think extrapolated data could be a problem if we're trying to predict beyond the range of our data, but I'm not entirely confident.
upvoted 0 times
...
Layla
7 months ago
I remember discussing how in-sample data can sometimes lead to overfitting, but I'm not sure if that's the main concern here.
upvoted 0 times
...
Francene
7 months ago
This is a good one. The fact that the data set only represents 80% of the full data set makes me wonder if the concern could be about the missing 20% and how that might impact the model's predictions.
upvoted 0 times
...
Bette
7 months ago
Okay, let's see. The data set represents feedback from 17 professions and 12 locations, and the question is asking about the most likely concern. I'm thinking it might be about extrapolating beyond the data we have.
upvoted 0 times
...
Idella
8 months ago
The question is asking about the model's ability to predict the outcome, so I'm guessing the concern is related to the data used to train the model. Maybe something about the representativeness of the data?
upvoted 0 times
...
Glory
8 months ago
Hmm, this looks like a tricky one. I'll need to think carefully about the data and the question to figure out the most likely concern.
upvoted 0 times
...
Malcom
10 months ago
Ha! This is like a real-life version of the age-old debate: tigers vs. lions. I bet the data scientists are having a field day with this one.
upvoted 0 times
Tamra
8 months ago
B: Yeah, that makes sense. It's always tricky when you're predicting something based on data outside of what you already have.
upvoted 0 times
...
Benton
10 months ago
A: I think the concern might be extrapolated data.
upvoted 0 times
...
...
Tyisha
11 months ago
But what about in-sample data? Could that also be a concern for the model's prediction?
upvoted 0 times
...
Kaitlyn
11 months ago
I agree with Alease, using data outside the range may not accurately predict the outcome.
upvoted 0 times
...
Janine
11 months ago
I'm not sure, but I'd be worried about the potential for bias in the data. Tigers and lions are both pretty exciting mascots, but I wonder if certain regions or professions might have a preference.
upvoted 0 times
Alba
11 months ago
A: That's a good point. They might need to consider collecting more data from a wider range of sources to improve the model's accuracy.
upvoted 0 times
...
Talia
11 months ago
B: Yeah, it could be biased towards those specific groups. Maybe they should try to get more diverse data.
upvoted 0 times
...
Lashawnda
11 months ago
A: I think the concern might be that the model is only based on data from certain professions and locations.
upvoted 0 times
...
...
Shizue
11 months ago
Out-of-sample data seems like the most likely issue. The model is trained on only 80% of the data, so it might not accurately reflect the full population.
upvoted 0 times
Luisa
10 months ago
Yes, the model might not generalize well to the entire population with only 80% of the data.
upvoted 0 times
...
Lazaro
11 months ago
I agree, out-of-sample data could lead to inaccurate predictions.
upvoted 0 times
...
...
Alease
11 months ago
I think the concern could be extrapolated data.
upvoted 0 times
...
Leana
11 months ago
You know, I bet the model would be a lot more accurate if they just had a vote-off between a tiger and a lion mascot. That would give us the true pulse of the nation!
upvoted 0 times
Chantell
10 months ago
D: I agree, out-of-sample data could introduce bias into the model.
upvoted 0 times
...
Lou
10 months ago
C: Maybe they should stick to in-sample data for a more reliable prediction.
upvoted 0 times
...
Cordelia
10 months ago
B: Yeah, using data outside of what was collected could affect the model's accuracy.
upvoted 0 times
...
Maurine
11 months ago
A: I think the concern might be extrapolated data.
upvoted 0 times
...
...

Save Cancel