New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon MLS-C01 Exam - Topic 3 Question 122 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 122
Topic #: 3
[All MLS-C01 Questions]

[Data Engineering]

A Machine Learning Specialist is preparing data for training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training

What should the Specialist do to optimize the data for training on SageMaker'?

Show Suggested Answer Hide Answer
Suggested Answer: C

Contribute your Thoughts:

0/2000 characters
Doretha
14 days ago
I agree, B seems solid. Compression helps speed.
upvoted 0 times
...
Gerry
19 days ago
I think option B is the best choice. Parquet is efficient.
upvoted 0 times
...
Eladia
24 days ago
D sounds interesting, but I doubt it helps with data format issues.
upvoted 0 times
...
Cristy
29 days ago
Totally agree with B, compression is key!
upvoted 0 times
...
Junita
1 month ago
Wait, isn't using numpy .array common? Why's it slowing things down?
upvoted 0 times
...
Tamala
1 month ago
Parquet is the way to go, my friends. Compress that data and watch the training speed soar!
upvoted 0 times
...
Suzi
1 month ago
Recordio protobuf, huh? Sounds like a magical incantation to summon a data-crunching wizard!
upvoted 0 times
...
Thurman
2 months ago
A could work, but I think transforming the data into the right format is more important than using batch transform.
upvoted 0 times
...
Christa
2 months ago
D sounds interesting, but I'm not sure if that's the right approach here. Optimizing the data itself might be a better solution.
upvoted 0 times
...
Katina
2 months ago
I'd go with B. Parquet is a great format for large datasets and can really speed up training.
upvoted 0 times
...
Mona
2 months ago
I'm pretty confident that option C is the way to go here. Transforming the dataset into the Recordio protobuf format is a common technique for optimizing data for SageMaker training, so that's the approach I'd take.
upvoted 0 times
...
Ashley
2 months ago
I think C might be better for speed.
upvoted 0 times
...
Audra
3 months ago
Option C seems like the way to go. Recordio protobuf is designed for efficient data handling in SageMaker.
upvoted 0 times
...
Lashawna
3 months ago
B is the way to go, Parquet is super efficient!
upvoted 0 times
...
Shasta
3 months ago
Option D seems a bit off-topic. Hyperparameter optimization is more about tuning the model, not optimizing the data itself. I think I'd go with either option B or C to address the data format issue.
upvoted 0 times
...
Lelia
4 months ago
Hmm, I'm not sure about this one. The question mentions that the numpy array is negatively affecting the speed of the training, so I'd probably try option A and use the SageMaker batch transform feature to transform the data into a DataFrame.
upvoted 0 times
...
Rodney
4 months ago
I'm a bit confused by this question. Does the Specialist need to compress the data or just transform it into a different format? I'm not sure if option B is the right approach here.
upvoted 0 times
...
Staci
4 months ago
I think I'd go with option C. Transforming the dataset into the Recordio protobuf format seems like the best way to optimize the data for training on SageMaker.
upvoted 0 times
Desiree
3 months ago
I think option B is better. Parquet format is efficient for large datasets.
upvoted 0 times
...
Rima
3 months ago
I’m leaning towards A. DataFrames are easy to work with in SageMaker.
upvoted 0 times
...
...
Ashton
4 months ago
I recall that hyperparameter optimization is more about tuning the model rather than optimizing data. So, I don't think option D is the answer.
upvoted 0 times
...
France
4 months ago
I feel like we went over this in our study group. Using SageMaker's batch transform feature seems like it could be a good option, but I'm not confident.
upvoted 0 times
...
Vivan
5 months ago
I'm not entirely sure, but I think transforming the dataset into Recordio protobuf format might be the right choice. It sounds familiar from the practice questions.
upvoted 0 times
...
Celeste
5 months ago
I remember we discussed the importance of data formats in our last class. I think using Apache Parquet could really help with speed.
upvoted 0 times
...

Save Cancel