The term "greedy algorithms" refers to machine-learning algorithms that:
Greedy algorithms build the solution iteratively by choosing at each step the option that appears best at that moment, without reconsidering earlier choices.
Which of the following methods should a data scientist use just before switching to a potential replacement model?
A/B testing lets you compare the current model against the candidate in parallel, measuring performance on live data, before fully switching to the new model.
Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?
Edit distance quantifies how many single-character insertions, deletions, or substitutions are needed to transform one string into another, making it a direct measure of their similarity.
A data scientist is clustering a data set but does not want to specify the number of clusters present. Which of the following algorithms should the data scientist use?
DBSCAN discovers clusters based on density without requiring you to predefine the number of clusters, automatically finding arbitrarily shaped groups and identifying noise points.
A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:
Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?
The aggregated feedback covers only 80% of respondents, mostly from a few professions and locations, so the model hasn't ''seen'' the remaining 20% (and those underrepresented groups). Its performance on those unseen subsets (out-of-sample data) is therefore the primary concern for how well it will predict the actual vote.
Vicki
22 days agoTasia
1 months agoHarrison
1 months agoReyes
2 months agoMatt
2 months agoKelvin
2 months ago