Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Machine Learning Associate Topic 3 Question 26 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam
Question #: 26
Topic #: 3
[All Databricks Machine Learning Associate Questions]

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

Show Suggested Answer Hide Answer
Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.


Databricks documentation on linear regression: Linear Regression in Spark ML

Contribute your Thoughts:

Hailey
2 days ago
They need to use StringIndexer prior to one-hot encoding.
upvoted 0 times
...
Rebbecca
8 days ago
I feel like VectorAssembler is more for combining features rather than one-hot encoding, so that option seems off to me.
upvoted 0 times
...
Bonita
13 days ago
I practiced a similar question, and I believe using StringIndexer is crucial for categorical features.
upvoted 0 times
...
Trina
19 days ago
I think the OneHotEncoder might need a method parameter, but I can't recall the exact syntax.
upvoted 0 times
...
Tonja
24 days ago
I remember something about needing to use StringIndexer before one-hot encoding, but I'm not entirely sure if that's the only step needed.
upvoted 0 times
...
Simona
30 days ago
I'm feeling pretty confident about this one. I think the solution is to specify the method parameter in the OneHotEncoder to handle the string columns properly.
upvoted 0 times
...
Cassi
1 month ago
Based on the options, it seems like we might need to use StringIndexer first to convert the string columns to numeric indices before one-hot encoding. That could be the issue.
upvoted 0 times
...
Melissia
1 month ago
I'm a bit confused by the OneHotEncoder setup. I wonder if we need to do some additional preprocessing on the data before applying it.
upvoted 0 times
...
Annmarie
1 month ago
Okay, let me think this through step-by-step. I think the key is to understand what the error is telling us and how we can fix it.
upvoted 0 times
...
Derick
1 month ago
Hmm, this looks like a tricky one. I'll need to carefully review the code and error message to figure out what's going on.
upvoted 0 times
...
Jill
1 month ago
Interesting, I wonder if we need to use VectorAssembler to combine the features into a single vector before one-hot encoding. That might be the key step we're missing.
upvoted 0 times
...
Mammie
1 month ago
Ah, I see. We might need to use StringIndexer first to convert the string columns to numeric indices before one-hot encoding them. That could be the issue.
upvoted 0 times
...
Eladia
1 month ago
Hmm, the fit operation seems unnecessary here since we're just trying to transform the data, not train a model. I'd try removing that line and see if that fixes the problem.
upvoted 0 times
...
Mel
1 month ago
I'm not entirely sure what the issue is here, but I think I might need to specify the method parameter to the OneHotEncoder to get this working.
upvoted 0 times
...
Thurman
6 months ago
Maybe the error is because they forgot to add the 'sparkly' parameter to the OneHotEncoder. You know, to make it extra fabulous.
upvoted 0 times
...
Shannan
6 months ago
I heard the data scientist tried to one-hot encode their socks. Turns out they were just a bunch of ones and zeros!
upvoted 0 times
Kenny
4 months ago
C: They need to use StringIndexer prior to one-hot encoding the features.
upvoted 0 times
...
Martha
4 months ago
B: They need to use VectorAssembler prior to one-hot encoding the features.
upvoted 0 times
...
Daniela
5 months ago
C: They need to use VectorAssembler prior to one-hot encoding the features.
upvoted 0 times
...
Delisa
5 months ago
B: They need to use StringIndexer prior to one-hot encoding the features.
upvoted 0 times
...
Hannah
5 months ago
A: They need to specify the method parameter to the OneHotEncoder.
upvoted 0 times
...
Wilburn
5 months ago
A: They need to specify the method parameter to the OneHotEncoder.
upvoted 0 times
...
...
Alfred
6 months ago
VectorAssembler? Sounds like a superhero name. Maybe that's the solution, but I'm not sure.
upvoted 0 times
Pura
5 months ago
User3: Maybe they need to use StringIndexer before one-hot encoding the features.
upvoted 0 times
...
Lonny
5 months ago
User2: No, they should specify the method parameter to the OneHotEncoder.
upvoted 0 times
...
Dick
5 months ago
User1: I think the data scientist needs to use VectorAssembler before one-hot encoding.
upvoted 0 times
...
...
Rosendo
6 months ago
Ah, I see the issue. The method parameter is missing from the OneHotEncoder. We need to specify that.
upvoted 0 times
Essie
5 months ago
User1: Let's add that parameter and see if it works.
upvoted 0 times
...
Joaquin
6 months ago
User2: Yes, that's correct. That should fix the error.
upvoted 0 times
...
Eden
6 months ago
User1: I think we need to specify the method parameter to the OneHotEncoder.
upvoted 0 times
...
...
Jenelle
7 months ago
Wait, I think we need to use StringIndexer first to convert the string columns to numerical values. Then we can use OneHotEncoder.
upvoted 0 times
Latrice
5 months ago
D: And OneHotEncoder will then encode those numerical values as binary vectors.
upvoted 0 times
...
Osvaldo
5 months ago
C: That makes sense, StringIndexer will convert the strings to numerical values.
upvoted 0 times
...
Whitley
5 months ago
B: Then we can use OneHotEncoder to encode the categorical features.
upvoted 0 times
...
Reita
6 months ago
A: I think you're right, we should use StringIndexer first.
upvoted 0 times
...
...
Ligia
7 months ago
I believe they should also use StringIndexer before one-hot encoding the features to properly encode the categorical values.
upvoted 0 times
...
Quentin
7 months ago
Hmm, the error is probably due to the fit operation. Let's try removing that line and see if it works.
upvoted 0 times
Lawrence
6 months ago
User2: Yeah, let's try that and see if it fixes the error.
upvoted 0 times
...
Adaline
6 months ago
User1: I think we should remove the line with the fit operation.
upvoted 0 times
...
...
Essie
7 months ago
I agree with Daisy. Without specifying the method parameter, the code won't work properly.
upvoted 0 times
...
Daisy
7 months ago
I think the data scientist needs to specify the method parameter to the OneHotEncoder.
upvoted 0 times
...

Save Cancel