Google Professional Machine Learning Engineer Exam - Topic 9 Question 23 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam

Question #: 23
Topic #: 9

[All Professional Machine Learning Engineer Questions]

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

ASignificantly increase the max_batch_size TensorFlow Serving parameter

BSwitch to the tensorflow-model-server-universal version of TensorFlow Serving

CSignificantly increase the max_enqueued_batches TensorFlow Serving parameter

DRecompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes

Show Suggested Answer

Suggested Answer: D

by Lizette at May 04, 2022, 07:09 AM

Limited Time Offer

25%

Off

Get Premium Professional Machine Learning Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Rusty

4 months ago

Increasing max_enqueued_batches seems like a solid move!

upvoted 0 times

...

Lashawnda

4 months ago

I disagree, switching to the universal version might not solve the latency issue.

upvoted 0 times

...

Roosevelt

4 months ago

Wait, can we really recompile TensorFlow Serving? Sounds complicated!

upvoted 0 times

...

Cherelle

4 months ago

I think option C is the way to go for better throughput.

upvoted 0 times

...

Loreen

5 months ago

A higher max_batch_size can really help with latency!

upvoted 0 times

...

Kimberlie

5 months ago

Recompiling TensorFlow Serving sounds complicated. I feel like that might be overkill for just improving latency without changing infrastructure.

upvoted 0 times

...

Jutta

5 months ago

I'm a bit confused about the universal version of TensorFlow Serving. Does it really make a difference in latency, or is it more about compatibility?

upvoted 0 times

...

Viola

5 months ago

I think I came across a similar question where adjusting max_enqueued_batches helped improve throughput. Maybe that's worth considering here?

upvoted 0 times

...

Galen

5 months ago

I remember reading about batch sizes in TensorFlow Serving, but I'm not sure if increasing max_batch_size is the best option for latency.

upvoted 0 times

...

Loreta

5 months ago

I'm a little confused on the difference between the options. I'll have to review my notes to make sure I understand Passive Structure Elements before answering this.

upvoted 0 times

...

Corrinne

5 months ago

Okay, let's think this through step-by-step. We need to protect PII data, use Cloud DLP, and follow Google's recommended practices with service accounts. I think option D sounds like the best approach.

upvoted 0 times

...

Leanora

5 months ago

I think using a scheduled task to start the Runtime Resource is definitely one of the options, it sounds familiar from practice questions.

upvoted 0 times

...

Carissa

5 months ago

Okay, let me see... I know cancer registrars are responsible for collecting and maintaining cancer data, so that's my best guess for this question.

upvoted 0 times

...