Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Machine Learning Associate Topic 2 Question 20 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam
Question #: 20
Topic #: 2
[All Databricks Machine Learning Associate Questions]

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.

Which of the following explanations justifies this suggestion?

Show Suggested Answer Hide Answer
Suggested Answer: A

In Spark ML, a transformer is an algorithm that can transform one DataFrame into another DataFrame. It takes a DataFrame as input and produces a new DataFrame as output. This transformation can involve adding new columns, modifying existing ones, or applying feature transformations. Examples of transformers in Spark MLlib include feature transformers like StringIndexer, VectorAssembler, and StandardScaler.


Databricks documentation on transformers: Transformers in Spark ML

Contribute your Thoughts:

Alex
2 days ago
Totally agree, it can lead to high dimensionality issues!
upvoted 0 times
...
Asuncion
8 days ago
One-hot encoding can be tricky for some algorithms.
upvoted 0 times
...
Tamar
13 days ago
I’m a bit confused about option D; I thought one-hot encoding was actually a common method for handling categorical variables.
upvoted 0 times
...
Herman
19 days ago
I feel like we practiced a question similar to this, and I think one-hot encoding can be computationally heavy, which might relate to option C.
upvoted 0 times
...
Yuonne
24 days ago
I think option A makes sense because some algorithms, like decision trees, might not need one-hot encoding at all.
upvoted 0 times
...
Lelia
1 month ago
I remember discussing one-hot encoding in class, but I’m not sure if it’s always the best choice for every algorithm.
upvoted 0 times
...
Jesusita
1 month ago
I'm a little confused by this question. The options don't seem to clearly explain why we shouldn't one-hot encode the categorical variables. I'll have to review my notes on feature engineering to make sure I understand the tradeoffs here.
upvoted 0 times
...
Flo
1 month ago
Okay, let me see. I think the key here is that the data scientist is suggesting we shouldn't one-hot encode the categorical variables in the feature repository. That makes me think option A is the best explanation - one-hot encoding can be problematic for some algorithms.
upvoted 0 times
...
Leeann
1 month ago
Hmm, I'm a bit unsure here. I know one-hot encoding is a common way to handle categorical variables, but the question is suggesting we shouldn't do it. I'll have to think this through carefully.
upvoted 0 times
...
Alida
1 month ago
I'm pretty confident about this one. I think the answer is A - one-hot encoding can be problematic for some machine learning algorithms.
upvoted 0 times
...
Emily
6 months ago
B) Yep, that makes the most sense. The target variable can change, so one-hot encoding shouldn't be in the feature repo.
upvoted 0 times
...
Felice
6 months ago
Ha! 'Not a common strategy', that's a funny way to put it. I wonder what the 'uncommon' strategies are.
upvoted 0 times
Galen
4 months ago
User 4: Yeah, it's not supported by most machine learning libraries either.
upvoted 0 times
...
Layla
4 months ago
User 3: It's computationally intensive and should only be used on small samples of training sets.
upvoted 0 times
...
Tony
5 months ago
User 2: I agree, it can be problematic for some machine learning algorithms.
upvoted 0 times
...
Hermila
5 months ago
User 1: One-hot encoding is not a common strategy for representing categorical variables numerically.
upvoted 0 times
...
...
India
6 months ago
C) Computationally intensive, huh? I guess one-hot encoding can be a bit heavy for the training set, so it's better to leave it for individual problems.
upvoted 0 times
...
Jettie
6 months ago
E) Ah, I see. Some machine learning algorithms may not play well with one-hot encoding, so it's a good idea to avoid it in the feature repository.
upvoted 0 times
Stephanie
5 months ago
C) One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.
upvoted 0 times
...
Maurine
5 months ago
E) That's a good point. It's important to consider how the target variable values can impact the effectiveness of one-hot encoding.
upvoted 0 times
...
Gaston
5 months ago
B) One-hot encoding is dependent on the target variable's values which differ for each application.
upvoted 0 times
...
...
Glenna
7 months ago
B) Hmm, that makes sense. The target variable can vary across different applications, so one-hot encoding shouldn't be done at the feature repository level.
upvoted 0 times
Chaya
5 months ago
E) One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.
upvoted 0 times
...
Mitsue
5 months ago
C) One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.
upvoted 0 times
...
Fernanda
5 months ago
A) One-hot encoding is dependent on the target variable's values which differ for each application.
upvoted 0 times
...
...
Joaquin
7 months ago
I disagree. One-hot encoding is necessary for certain algorithms.
upvoted 0 times
...
Shawna
7 months ago
I agree with Arlette. It can be computationally intensive.
upvoted 0 times
...
Arlette
7 months ago
I think one-hot encoding is not always the best approach.
upvoted 0 times
...

Save Cancel