Databricks Certified Professional Data Scientist Exam - Topic 5 Question 55 Discussion

Actual exam question for Databricks's Databricks Certified Professional Data Scientist exam

Question #: 55
Topic #: 5

[All Databricks Certified Professional Data Scientist Questions]

Your company has organized an online campaign for feedback on product quality and you have all the responses for the product reviews, in the response form people have check box as well as text field. Now you know that people who do not fill in or write non-dictionary word in the text field are not considered valid feedback. People who fill in text field with proper English words are considered valid response. Which of the following method you should not use to identify whether the response is valid or not?

ANaive Bayes

BLogistic Regression

CRandom Decision Forests

DAny one of the above
In this problem you have been given high-dimensional independent variables like yeS; nO; no English words , test results etc. and you have to predict either valid or not valid (One of two). So all of the below technique can be applied to this problem.
* Support vector machines
* Naive Bayes
* Logistic regression
* Random decision forests

Show Suggested Answer

Suggested Answer: B

by Helaine at Mar 17, 2024, 10:14 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Professional Data Scientist Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Dick

4 months ago

All of them can be used, so D seems right.

upvoted 0 times

...

Mignon

5 months ago

Surprised that any of these methods could work!

upvoted 0 times

...

Clorinda

5 months ago

Random Decision Forests? Really?

upvoted 0 times

...

Nieves

5 months ago

I think Logistic Regression is better for this.

upvoted 0 times

...

Dwight

5 months ago

Naive Bayes is great for text classification!

upvoted 0 times

...

Lucille

6 months ago

I’m a bit confused about the "Any one of the above" option. I thought all methods could be applied, but maybe one is just not as effective?

upvoted 0 times

...

Nathan

6 months ago

I practiced a similar question where we had to choose the right model for binary classification, and I feel like any of these methods could work, but I’m leaning towards Random Decision Forests being less suitable.

upvoted 0 times

...

Aron

6 months ago

I think Logistic Regression might not capture the complexity of the text data as well as the others, but I need to double-check that.

upvoted 0 times

...

Wade

6 months ago

I remember we discussed how Naive Bayes is good for text classification, but I'm not sure if it's the best choice here.

upvoted 0 times

...

Jacki

6 months ago

I think the key here is to focus on preprocessing the text data properly. We'll need to handle things like stop words, stemming/lemmatization, and maybe even sentiment analysis to identify the "non-dictionary words" mentioned in the prompt.

upvoted 0 times

...

Nu

6 months ago

Hmm, I'm not sure Naive Bayes is the best choice here. Since we're dealing with text data, a model like Random Forests or Support Vector Machines might be more appropriate.

upvoted 0 times

...

Josefa

6 months ago

I'm a bit confused on the best approach here. Should we be using a supervised learning algorithm since we have labeled data, or would an unsupervised method like clustering work better?

upvoted 0 times

...

Allene

6 months ago

This seems like a straightforward text classification problem. I'd probably start with a simple logistic regression model and see how that performs.

upvoted 0 times

...

Mari

6 months ago

I'm a little confused by the wording of the question. Is the relational algebra expression supposed to be provided, or is the goal to just match the SQL statements to the correct operation? I'll need to clarify that before I can confidently answer.

upvoted 0 times

...

Lazaro

6 months ago

I'm pretty confident that Express Indexing is the right answer. It's designed to optimize performance by reducing the overhead of indexing data.

upvoted 0 times

...

Alita

6 months ago

Okay, I've got this. The correct answer is option C - the training loss decreases while the validation loss increases. That means the model is performing well on the training data but not generalizing well to the validation data, which is a clear sign of overfitting.

upvoted 0 times

...