Databricks Machine Learning Associate Exam - Topic 3 Question 26 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam

Question #: 26
Topic #: 3

[All Databricks Machine Learning Associate Questions]

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

AThey need to specify the method parameter to the OneHotEncoder.

BThey need to remove the line with the fit operation.

CThey need to use Stringlndexer prior to one-hot encodinq the features.

DThey need to use VectorAssembler prior to one-hot encoding the features.

Show Suggested Answer

Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.

Databricks documentation on linear regression: Linear Regression in Spark ML

by Brock at Mar 04, 2025, 05:06 AM

Limited Time Offer

25%

Off

Get Premium Databricks Machine Learning Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Ivette

3 months ago

I thought VectorAssembler was needed for this too!

upvoted 0 times

...

Francine

3 months ago

Removing the fit line? That doesn't seem right.

upvoted 0 times

...

Arminda

3 months ago

Wait, can you skip the fit operation? Sounds off.

upvoted 0 times

...

Luz

3 months ago

Totally agree with that! StringIndexer is essential.

upvoted 0 times

...

Hailey

4 months ago

They need to use StringIndexer prior to one-hot encoding.

upvoted 0 times

...

Rebbecca

4 months ago

I feel like VectorAssembler is more for combining features rather than one-hot encoding, so that option seems off to me.

upvoted 0 times

...

Bonita

4 months ago

I practiced a similar question, and I believe using StringIndexer is crucial for categorical features.

upvoted 0 times

...

Trina

4 months ago

I think the OneHotEncoder might need a method parameter, but I can't recall the exact syntax.

upvoted 0 times

...

Tonja

4 months ago

I remember something about needing to use StringIndexer before one-hot encoding, but I'm not entirely sure if that's the only step needed.

upvoted 0 times

...

Simona

5 months ago

I'm feeling pretty confident about this one. I think the solution is to specify the method parameter in the OneHotEncoder to handle the string columns properly.

upvoted 0 times

...

Cassi

5 months ago

Based on the options, it seems like we might need to use StringIndexer first to convert the string columns to numeric indices before one-hot encoding. That could be the issue.

upvoted 0 times

...

Melissia

5 months ago

I'm a bit confused by the OneHotEncoder setup. I wonder if we need to do some additional preprocessing on the data before applying it.

upvoted 0 times

...

Annmarie

5 months ago

Okay, let me think this through step-by-step. I think the key is to understand what the error is telling us and how we can fix it.

upvoted 0 times

...

Derick

5 months ago

Hmm, this looks like a tricky one. I'll need to carefully review the code and error message to figure out what's going on.

upvoted 0 times

...

Jill

5 months ago

Interesting, I wonder if we need to use VectorAssembler to combine the features into a single vector before one-hot encoding. That might be the key step we're missing.

upvoted 0 times

...

Mammie

5 months ago

Ah, I see. We might need to use StringIndexer first to convert the string columns to numeric indices before one-hot encoding them. That could be the issue.

upvoted 0 times

...

Eladia

5 months ago

Hmm, the fit operation seems unnecessary here since we're just trying to transform the data, not train a model. I'd try removing that line and see if that fixes the problem.

upvoted 0 times

...

Mel

5 months ago

I'm not entirely sure what the issue is here, but I think I might need to specify the method parameter to the OneHotEncoder to get this working.

upvoted 0 times

...

Thurman

9 months ago

Maybe the error is because they forgot to add the 'sparkly' parameter to the OneHotEncoder. You know, to make it extra fabulous.

upvoted 0 times

...

Shannan

9 months ago

I heard the data scientist tried to one-hot encode their socks. Turns out they were just a bunch of ones and zeros!

upvoted 0 times

Kenny

8 months ago

C: They need to use StringIndexer prior to one-hot encoding the features.

upvoted 0 times

...

Martha

8 months ago

B: They need to use VectorAssembler prior to one-hot encoding the features.

upvoted 0 times

...

Daniela

9 months ago

C: They need to use VectorAssembler prior to one-hot encoding the features.

upvoted 0 times

...

Delisa

9 months ago

B: They need to use StringIndexer prior to one-hot encoding the features.

upvoted 0 times

...

Hannah

9 months ago

A: They need to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Wilburn

9 months ago

A: They need to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Alfred

10 months ago

VectorAssembler? Sounds like a superhero name. Maybe that's the solution, but I'm not sure.

upvoted 0 times

Pura

8 months ago

User3: Maybe they need to use StringIndexer before one-hot encoding the features.

upvoted 0 times

...

Lonny

8 months ago

User2: No, they should specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Dick

9 months ago

User1: I think the data scientist needs to use VectorAssembler before one-hot encoding.

upvoted 0 times

...

Rosendo

10 months ago

Ah, I see the issue. The method parameter is missing from the OneHotEncoder. We need to specify that.

upvoted 0 times

Essie

9 months ago

User1: Let's add that parameter and see if it works.

upvoted 0 times

...

Joaquin

9 months ago

User2: Yes, that's correct. That should fix the error.

upvoted 0 times

...

Eden

10 months ago

User1: I think we need to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Jenelle

10 months ago

Wait, I think we need to use StringIndexer first to convert the string columns to numerical values. Then we can use OneHotEncoder.

upvoted 0 times

Latrice

8 months ago

D: And OneHotEncoder will then encode those numerical values as binary vectors.

upvoted 0 times

...

Osvaldo

8 months ago

C: That makes sense, StringIndexer will convert the strings to numerical values.

upvoted 0 times

...

Whitley

8 months ago

B: Then we can use OneHotEncoder to encode the categorical features.

upvoted 0 times

...

Reita

10 months ago

A: I think you're right, we should use StringIndexer first.

upvoted 0 times

...

Ligia

10 months ago

I believe they should also use StringIndexer before one-hot encoding the features to properly encode the categorical values.

upvoted 0 times

...

Quentin

10 months ago

Hmm, the error is probably due to the fit operation. Let's try removing that line and see if it works.

upvoted 0 times

Lawrence

9 months ago

User2: Yeah, let's try that and see if it fixes the error.

upvoted 0 times

...

Adaline

10 months ago

User1: I think we should remove the line with the fit operation.

upvoted 0 times

...

Essie

10 months ago

I agree with Daisy. Without specifying the method parameter, the code won't work properly.

upvoted 0 times

...

Daisy

11 months ago

I think the data scientist needs to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...