Amazon MLS-C01 Exam - Topic 3 Question 122 Discussion

Actual exam question for Amazon's MLS-C01 exam

Question #: 122
Topic #: 3

[Data Engineering]

A Machine Learning Specialist is preparing data for training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training

What should the Specialist do to optimize the data for training on SageMaker'?

AUse the SageMaker batch transform feature to transform the training data into a DataFrame

BUse AWS Glue to compress the data into the Apache Parquet format

CTransform the dataset into the Recordio protobuf format

DUse the SageMaker hyperparameter optimization feature to automatically optimize the data

Show Suggested Answer

Suggested Answer: C

by Gilberto at Oct 17, 2025, 12:13 PM

Limited Time Offer

25%

Off

Get Premium MLS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Ligia

1 month ago

Overall, B is my pick. Parquet is widely used and effective.

upvoted 0 times

...

Sylvie

1 month ago

D sounds tempting, but it’s more about tuning than data prep.

upvoted 0 times

...

Aja

1 month ago

I feel like A could work, but DataFrames might not be fast enough.

upvoted 0 times

...

Belen

2 months ago

C is interesting too. Recordio is designed for SageMaker.

upvoted 0 times

...

Doretha

2 months ago

I agree, B seems solid. Compression helps speed.

upvoted 0 times

...

Gerry

2 months ago

I think option B is the best choice. Parquet is efficient.

upvoted 0 times

...

Eladia

3 months ago

D sounds interesting, but I doubt it helps with data format issues.

upvoted 0 times

...

Cristy

3 months ago

Totally agree with B, compression is key!

upvoted 0 times

...

Junita

3 months ago

Wait, isn't using numpy .array common? Why's it slowing things down?

upvoted 0 times

...

Tamala

3 months ago

Parquet is the way to go, my friends. Compress that data and watch the training speed soar!

upvoted 0 times

...

Suzi

3 months ago

Recordio protobuf, huh? Sounds like a magical incantation to summon a data-crunching wizard!

upvoted 0 times

...

Thurman

4 months ago

A could work, but I think transforming the data into the right format is more important than using batch transform.

upvoted 0 times

...

Christa

4 months ago

D sounds interesting, but I'm not sure if that's the right approach here. Optimizing the data itself might be a better solution.

upvoted 0 times

...

Katina

4 months ago

I'd go with B. Parquet is a great format for large datasets and can really speed up training.

upvoted 0 times

...

I'm pretty confident that option C is the way to go here. Transforming the dataset into the Recordio protobuf format is a common technique for optimizing data for SageMaker training, so that's the approach I'd take.

upvoted 0 times

...

Ashley

4 months ago

I think C might be better for speed.

upvoted 0 times

...

Audra

4 months ago

Option C seems like the way to go. Recordio protobuf is designed for efficient data handling in SageMaker.

upvoted 0 times

...

Lashawna

5 months ago

B is the way to go, Parquet is super efficient!

upvoted 0 times

...

Shasta

5 months ago

Option D seems a bit off-topic. Hyperparameter optimization is more about tuning the model, not optimizing the data itself. I think I'd go with either option B or C to address the data format issue.

upvoted 0 times

...

Lelia

5 months ago

Hmm, I'm not sure about this one. The question mentions that the numpy array is negatively affecting the speed of the training, so I'd probably try option A and use the SageMaker batch transform feature to transform the data into a DataFrame.

upvoted 0 times

...

Rodney

6 months ago

I'm a bit confused by this question. Does the Specialist need to compress the data or just transform it into a different format? I'm not sure if option B is the right approach here.

upvoted 0 times

...

Staci

6 months ago

I think I'd go with option C. Transforming the dataset into the Recordio protobuf format seems like the best way to optimize the data for training on SageMaker.

upvoted 0 times

Lenna

20 days ago

Option C is interesting, but I feel like B has more advantages.

upvoted 0 times

...

Cora

25 days ago

I agree with Desiree. Compression can really speed things up.

upvoted 0 times

...

Desiree

5 months ago

I think option B is better. Parquet format is efficient for large datasets.

upvoted 0 times

...

Rima

5 months ago

I’m leaning towards A. DataFrames are easy to work with in SageMaker.

upvoted 0 times

...

Ashton

6 months ago

I recall that hyperparameter optimization is more about tuning the model rather than optimizing data. So, I don't think option D is the answer.

upvoted 0 times

...

France

6 months ago

I feel like we went over this in our study group. Using SageMaker's batch transform feature seems like it could be a good option, but I'm not confident.

upvoted 0 times

...

Vivan

6 months ago

I'm not entirely sure, but I think transforming the dataset into Recordio protobuf format might be the right choice. It sounds familiar from the practice questions.

upvoted 0 times

...

Celeste

7 months ago

I remember we discussed the importance of data formats in our last class. I think using Apache Parquet could really help with speed.

upvoted 0 times

...

Amazon MLS-C01 Exam - Topic 3 Question 122 Discussion

Contribute your Thoughts:

Ligia

Sylvie

Aja

Belen

Doretha

Gerry

Eladia

Cristy

Junita

Tamala

Suzi

Thurman

Christa

Katina

Mona

Ashley

Audra

Lashawna

Shasta

Lelia

Rodney

Staci

Lenna

Cora

Desiree

Rima

Ashton

France

Vivan

Celeste