Databricks Certified Professional Data Scientist Exam - Topic 5 Question 10 Discussion

Actual exam question for Databricks's Databricks Certified Professional Data Scientist exam

Question #: 10
Topic #: 5

[All Databricks Certified Professional Data Scientist Questions]

Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1. Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C. The results of the regression are seen in the exhibit. You cannot request additional dat

a. what is a way that you could try to increase the R2 of the model without artificially inflating it?

ACreate clusters based on the data and use them as model inputs

BForce all 15 variables into the model as independent variables

CCreate interaction variables based only on variables A, B, and C

DBreak variables A, B, and C into their own univariate models
In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. (This term should be distinguished from multivariate linear regression^ where multiple correlated dependent variables are predicted, rather than a single scalar variable.) In linear regression data are modeled using linear predictor functions, and unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X. Less commonly: linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of y given X is expressed as a linear function of X. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of y given X, rather than on the joint probability distribution of y and X: which is the domain of multivariate analysis.

Show Suggested Answer

Suggested Answer: A

by Lucy at May 03, 2022, 06:12 PM

Limited Time Offer

25%

6 months ago

I'm pretty sure this is about identifying potential revenue manipulation. Strange sales patterns are always a red flag.

upvoted 0 times

...