New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Machine Learning Associate Exam - Topic 2 Question 20 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam
Question #: 20
Topic #: 2
[All Databricks Machine Learning Associate Questions]

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.

Which of the following explanations justifies this suggestion?

Show Suggested Answer Hide Answer
Suggested Answer: A

In Spark ML, a transformer is an algorithm that can transform one DataFrame into another DataFrame. It takes a DataFrame as input and produces a new DataFrame as output. This transformation can involve adding new columns, modifying existing ones, or applying feature transformations. Examples of transformers in Spark MLlib include feature transformers like StringIndexer, VectorAssembler, and StandardScaler.


Databricks documentation on transformers: Transformers in Spark ML

Contribute your Thoughts:

0/2000 characters
Marshall
3 months ago
I thought one-hot was the go-to for categorical variables?
upvoted 0 times
...
Ricarda
3 months ago
It’s not uncommon, but definitely depends on the context.
upvoted 0 times
...
Dorthy
3 months ago
Wait, is one-hot encoding really that bad?
upvoted 0 times
...
Alex
4 months ago
Totally agree, it can lead to high dimensionality issues!
upvoted 0 times
...
Asuncion
4 months ago
One-hot encoding can be tricky for some algorithms.
upvoted 0 times
...
Tamar
4 months ago
I’m a bit confused about option D; I thought one-hot encoding was actually a common method for handling categorical variables.
upvoted 0 times
...
Herman
4 months ago
I feel like we practiced a question similar to this, and I think one-hot encoding can be computationally heavy, which might relate to option C.
upvoted 0 times
...
Yuonne
4 months ago
I think option A makes sense because some algorithms, like decision trees, might not need one-hot encoding at all.
upvoted 0 times
...
Lelia
5 months ago
I remember discussing one-hot encoding in class, but I’m not sure if it’s always the best choice for every algorithm.
upvoted 0 times
...
Jesusita
5 months ago
I'm a little confused by this question. The options don't seem to clearly explain why we shouldn't one-hot encode the categorical variables. I'll have to review my notes on feature engineering to make sure I understand the tradeoffs here.
upvoted 0 times
...
Flo
5 months ago
Okay, let me see. I think the key here is that the data scientist is suggesting we shouldn't one-hot encode the categorical variables in the feature repository. That makes me think option A is the best explanation - one-hot encoding can be problematic for some algorithms.
upvoted 0 times
...
Leeann
5 months ago
Hmm, I'm a bit unsure here. I know one-hot encoding is a common way to handle categorical variables, but the question is suggesting we shouldn't do it. I'll have to think this through carefully.
upvoted 0 times
...
Alida
5 months ago
I'm pretty confident about this one. I think the answer is A - one-hot encoding can be problematic for some machine learning algorithms.
upvoted 0 times
...
Emily
9 months ago
B) Yep, that makes the most sense. The target variable can change, so one-hot encoding shouldn't be in the feature repo.
upvoted 0 times
...
Felice
9 months ago
Ha! 'Not a common strategy', that's a funny way to put it. I wonder what the 'uncommon' strategies are.
upvoted 0 times
Galen
8 months ago
User 4: Yeah, it's not supported by most machine learning libraries either.
upvoted 0 times
...
Layla
8 months ago
User 3: It's computationally intensive and should only be used on small samples of training sets.
upvoted 0 times
...
Tony
8 months ago
User 2: I agree, it can be problematic for some machine learning algorithms.
upvoted 0 times
...
Hermila
8 months ago
User 1: One-hot encoding is not a common strategy for representing categorical variables numerically.
upvoted 0 times
...
...
India
10 months ago
C) Computationally intensive, huh? I guess one-hot encoding can be a bit heavy for the training set, so it's better to leave it for individual problems.
upvoted 0 times
...
Jettie
10 months ago
E) Ah, I see. Some machine learning algorithms may not play well with one-hot encoding, so it's a good idea to avoid it in the feature repository.
upvoted 0 times
Stephanie
8 months ago
C) One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.
upvoted 0 times
...
Maurine
9 months ago
E) That's a good point. It's important to consider how the target variable values can impact the effectiveness of one-hot encoding.
upvoted 0 times
...
Gaston
9 months ago
B) One-hot encoding is dependent on the target variable's values which differ for each application.
upvoted 0 times
...
...
Glenna
10 months ago
B) Hmm, that makes sense. The target variable can vary across different applications, so one-hot encoding shouldn't be done at the feature repository level.
upvoted 0 times
Chaya
9 months ago
E) One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.
upvoted 0 times
...
Mitsue
9 months ago
C) One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.
upvoted 0 times
...
Fernanda
9 months ago
A) One-hot encoding is dependent on the target variable's values which differ for each application.
upvoted 0 times
...
...
Joaquin
11 months ago
I disagree. One-hot encoding is necessary for certain algorithms.
upvoted 0 times
...
Shawna
11 months ago
I agree with Arlette. It can be computationally intensive.
upvoted 0 times
...
Arlette
11 months ago
I think one-hot encoding is not always the best approach.
upvoted 0 times
...

Save Cancel