New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Machine Learning Engineer Exam - Topic 9 Question 23 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam
Question #: 23
Topic #: 9
[All Professional Machine Learning Engineer Questions]

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Rusty
4 months ago
Increasing max_enqueued_batches seems like a solid move!
upvoted 0 times
...
Lashawnda
4 months ago
I disagree, switching to the universal version might not solve the latency issue.
upvoted 0 times
...
Roosevelt
4 months ago
Wait, can we really recompile TensorFlow Serving? Sounds complicated!
upvoted 0 times
...
Cherelle
4 months ago
I think option C is the way to go for better throughput.
upvoted 0 times
...
Loreen
5 months ago
A higher max_batch_size can really help with latency!
upvoted 0 times
...
Kimberlie
5 months ago
Recompiling TensorFlow Serving sounds complicated. I feel like that might be overkill for just improving latency without changing infrastructure.
upvoted 0 times
...
Jutta
5 months ago
I'm a bit confused about the universal version of TensorFlow Serving. Does it really make a difference in latency, or is it more about compatibility?
upvoted 0 times
...
Viola
5 months ago
I think I came across a similar question where adjusting max_enqueued_batches helped improve throughput. Maybe that's worth considering here?
upvoted 0 times
...
Galen
5 months ago
I remember reading about batch sizes in TensorFlow Serving, but I'm not sure if increasing max_batch_size is the best option for latency.
upvoted 0 times
...
Loreta
5 months ago
I'm a little confused on the difference between the options. I'll have to review my notes to make sure I understand Passive Structure Elements before answering this.
upvoted 0 times
...
Corrinne
5 months ago
Okay, let's think this through step-by-step. We need to protect PII data, use Cloud DLP, and follow Google's recommended practices with service accounts. I think option D sounds like the best approach.
upvoted 0 times
...
Leanora
5 months ago
I think using a scheduled task to start the Runtime Resource is definitely one of the options, it sounds familiar from practice questions.
upvoted 0 times
...
Carissa
5 months ago
Okay, let me see... I know cancer registrars are responsible for collecting and maintaining cancer data, so that's my best guess for this question.
upvoted 0 times
...

Save Cancel