New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Professional Data Scientist Exam - Topic 5 Question 10 Discussion

Actual exam question for Databricks's Databricks Certified Professional Data Scientist exam
Question #: 10
Topic #: 5
[All Databricks Certified Professional Data Scientist Questions]

Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1. Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C. The results of the regression are seen in the exhibit. You cannot request additional dat

a. what is a way that you could try to increase the R2 of the model without artificially inflating it?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Hortencia
4 months ago
Clustering sounds interesting, but not sure it’ll boost R2 effectively.
upvoted 0 times
...
Theola
4 months ago
Forcing all 15 variables in seems risky, not a good idea.
upvoted 0 times
...
Curt
4 months ago
Wait, can you really increase R2 without adding more variables?
upvoted 0 times
...
Dan
4 months ago
Definitely agree with C, interaction variables can help!
upvoted 0 times
...
Eden
5 months ago
I think option C is the best choice!
upvoted 0 times
...
Anabel
5 months ago
I recall a practice question where clustering improved model performance, so maybe option A could work too, but I’m not completely confident about that.
upvoted 0 times
...
Quinn
5 months ago
I think breaking A, B, and C into univariate models might not really help increase R2 effectively. It feels like it would just complicate things.
upvoted 0 times
...
Xochitl
5 months ago
I'm not entirely sure, but forcing all 15 variables into the model seems risky. It could lead to overfitting, right?
upvoted 0 times
...
Bernardine
5 months ago
I remember we discussed how creating interaction variables could help capture more complexity in the relationships, so I think option C might be a good choice.
upvoted 0 times
...
Vernice
5 months ago
Ah, I see. This is all about scaling the hardware for the VMs based on the user group requirements. I'll need to do some calculations to determine the optimal core count.
upvoted 0 times
...
Carmela
5 months ago
This seems like a straightforward question. I'd go with option B and submit the file to verify if it's infected or not.
upvoted 0 times
...
Leonie
5 months ago
Okay, let me see here. I know active transformations change the order or number of rows, while passive ones don't. So A and D seem like the right answers. But I'm a bit unsure about C - can transformations really be both active and passive at the same time? I'll have to double-check that.
upvoted 0 times
...
Barney
5 months ago
I'm pretty sure this is about identifying potential revenue manipulation. Strange sales patterns are always a red flag.
upvoted 0 times
...

Save Cancel