Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Professional Data Scientist Topic 6 Question 82 Discussion

Actual exam question for Databricks's Databricks Certified Professional Data Scientist exam
Question #: 82
Topic #: 6
[All Databricks Certified Professional Data Scientist Questions]

Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1. Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C. The results of the regression are seen in the exhibit. You cannot request additional dat

a. what is a way that you could try to increase the R2 of the model without artificially inflating it?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

Carolynn
2 days ago
I think creating interaction variables based on A, B, and C could be a solid option. We practiced a similar question where interaction terms improved the model fit.
upvoted 0 times
...
Ty
8 days ago
I'm a bit unsure about the best approach here. Forcing all 15 variables into the model seems risky, but I think creating clusters could also complicate things.
upvoted 0 times
...
Silvana
13 days ago
I remember we discussed how interaction variables can sometimes help capture relationships that aren't obvious with just the main effects. That might be a good way to increase R2 without adding noise.
upvoted 0 times
...
Elbert
19 days ago
This seems pretty straightforward to me. I'd just go with option C and create the interaction variables. That's a solid way to try to improve the model without violating any of the restrictions. As long as the interactions are meaningful, that should give us a nice boost in R-squared.
upvoted 0 times
...
Gerald
24 days ago
Okay, I think I've got a strategy here. Since we can't add more variables, I would focus on creating interaction terms between A, B, and C. That could help capture some of the more complex relationships that might be driving sales, without artificially inflating the R-squared.
upvoted 0 times
...
Alaine
30 days ago
I'm a bit confused by the restriction to only using the 15 variables provided. Wouldn't it be better to try adding some additional variables that could be relevant, even if they aren't directly correlated with sales? That might help increase the model's explanatory power.
upvoted 0 times
...
Socorro
1 month ago
Hmm, this is an interesting question. I think I would start by looking at the variables A, B, and C in more detail to see if there are any interactions or nonlinear relationships that could be captured to improve the model's R-squared.
upvoted 0 times
...
Vivan
5 months ago
Why don't we just throw in the kitchen sink and see what happens? I bet that would really boost the R2. What could go wrong?
upvoted 0 times
...
Dorethea
5 months ago
I'm with Hildegarde on the interaction variables. That's the way to go if we want to get the most out of the data without cheating.
upvoted 0 times
...
Leonora
5 months ago
Forcing all 15 variables into the model is a bad idea. That's just going to lead to overfitting and won't give us any meaningful insights.
upvoted 0 times
...
Eden
5 months ago
Breaking the variables into their own univariate models doesn't seem like it would improve the overall R2. We need to look at the combined impact of the key variables.
upvoted 0 times
Billy
5 months ago
Breaking the variables into their own univariate models doesn't seem like it would improve the overall R2. We need to look at the combined impact of the key variables.
upvoted 0 times
...
Ramonita
5 months ago
C) Create interaction variables based only on variables A, B, and C
upvoted 0 times
...
Tamie
5 months ago
A) Create clusters based on the data and use them as model inputs
upvoted 0 times
...
...
Destiny
7 months ago
That's an interesting point, but I still think breaking variables A, B, and C into their own univariate models could also be a valid strategy.
upvoted 0 times
...
Hildegarde
7 months ago
I think creating interaction variables based on A, B, and C could be a good way to increase the R2 without artificially inflating it. That would allow us to capture any non-linear relationships between the variables.
upvoted 0 times
Lashaunda
5 months ago
D) Break variables A, B, and C into their own univariate models
upvoted 0 times
...
Effie
5 months ago
That sounds like a good idea. It could help capture more complex relationships.
upvoted 0 times
...
Kip
6 months ago
C) Create interaction variables based only on variables A, B, and C
upvoted 0 times
...
Nilsa
6 months ago
A) Create clusters based on the data and use them as model inputs
upvoted 0 times
...
...
Novella
7 months ago
I disagree, I believe creating interaction variables based only on variables A, B, and C would be a better approach to improve the model.
upvoted 0 times
...
Destiny
7 months ago
I think creating clusters based on the data could help increase the R2 without artificially inflating it.
upvoted 0 times
...

Save Cancel