New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Machine Learning Engineer Exam - Topic 5 Question 75 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam
Question #: 75
Topic #: 5
[All Professional Machine Learning Engineer Questions]

You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator;

estimator tf.estimator.DNNRegressor(

feature_columns[YOUR_LIST_OF_FEATURES],

hidden_units-[1024, 512, 256],

dropoutNone)

Your model performs well, but Just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You are willing to accept a small decrease in performance in order to reach the latency requirement Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Hortencia
3 months ago
Definitely not increasing dropout in PREDICT mode!
upvoted 0 times
...
Audria
3 months ago
Wait, can quantization really make that big of a difference?
upvoted 0 times
...
Lashandra
4 months ago
Increasing dropout might hurt accuracy, not sure about that.
upvoted 0 times
...
Jacquelyne
4 months ago
I think quantization is a solid move too.
upvoted 0 times
...
Mattie
4 months ago
Switching to GPU should help with latency!
upvoted 0 times
...
Casie
4 months ago
I’m not confident about retraining with a higher dropout rate, it seems like it might hurt the model's performance too much.
upvoted 0 times
...
Reuben
4 months ago
Quantization sounds familiar; I think reducing precision to tf.float16 could help with latency without retraining.
upvoted 0 times
...
Elden
5 months ago
I think switching to GPU serving could really speed things up, but I wonder if it’s worth the cost.
upvoted 0 times
...
Lettie
5 months ago
I remember we discussed dropout rates in class, but I'm not sure increasing it in PREDICT mode would help with latency.
upvoted 0 times
...
Louvenia
5 months ago
Switching to GPU serving seems like the most straightforward option here. GPUs are generally better suited for fast inference, so that could be a good way to hit the latency target without sacrificing too much model performance.
upvoted 0 times
...
Freeman
5 months ago
Okay, let me think this through. Increasing the dropout rate and retraining the model could potentially reduce the model size and complexity, which might help with latency. But I'm not sure if that would be enough to meet the 8ms requirement.
upvoted 0 times
...
Margart
5 months ago
Hmm, this is a tricky one. I'm not sure if increasing the dropout rate in PREDICT mode would really help with the latency requirement. That seems like it might just hurt the model's performance.
upvoted 0 times
...
Twanna
5 months ago
Ah, I think the key here is the mention of quantization. Reducing the floating point precision to tf.float16 could definitely help with latency, and it sounds like the best option to try first based on the requirements.
upvoted 0 times
...
Simona
5 months ago
This seems like a straightforward conceptual question. I'll focus on the definition of a liability and think through each option carefully.
upvoted 0 times
...
Ollie
5 months ago
I think the answer is A. TARGETDDL seems like the parameter that would enable DDL replication for a Replicat.
upvoted 0 times
...
Lauran
5 months ago
Okay, I've got a strategy for this. The key is to remember that the units are tested first in a traditional way after integration, so that narrows down the options. I'll carefully consider each test environment and see which one makes the most sense.
upvoted 0 times
...
Theodora
10 months ago
Quantization is the way to go here. Reducing the precision to tf.float16 should give us a nice latency boost without sacrificing too much model performance. And hey, at least we're not trying to run it on a Commodore 64, right?
upvoted 0 times
...
Zita
10 months ago
Haha, if I were the model, I'd be like 'Seriously? You want me to predict house prices in 8ms? What is this, the Flash's house?'
upvoted 0 times
Margarita
9 months ago
B) Increase the dropout rate to 0.8 and retrain your model.
upvoted 0 times
...
Kyoko
9 months ago
C) Switch from CPU to GPU serving
upvoted 0 times
...
Tenesha
9 months ago
A) Increase the dropout rate to 0.8 in_PREDICT mode by adjusting the TensorFlow Serving parameters
upvoted 0 times
...
...
Pamella
10 months ago
Switching to GPU serving is a good option, but it might be overkill for the 8ms latency requirement. I'd try the quantization approach first - reducing the precision to tf.float16 could give us a quicker win.
upvoted 0 times
Emilio
8 months ago
That's a good point. Let's try reducing the precision first and see if that helps with the latency.
upvoted 0 times
...
Rosio
8 months ago
D) Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
upvoted 0 times
...
Marilynn
9 months ago
C) Switch from CPU to GPU serving
upvoted 0 times
...
...
Catarina
11 months ago
Increasing the dropout rate to 0.8 in the _PREDICT mode seems like an interesting idea, but I'm not sure if that would actually improve the latency. Retraining the model with the higher dropout might work, but that could impact performance.
upvoted 0 times
Aide
9 months ago
B) Increase the dropout rate to 0.8 and retrain your model
upvoted 0 times
...
Junita
9 months ago
C) Switch from CPU to GPU serving
upvoted 0 times
...
Nu
9 months ago
A) Increase the dropout rate to 0.8 in _PREDICT mode by adjusting the TensorFlow Serving parameters
upvoted 0 times
...
...
Ngoc
11 months ago
But wouldn't applying quantization to the SavedModel also help reduce latency?
upvoted 0 times
...
Aliza
11 months ago
I agree with Loreen, that could help improve the latency.
upvoted 0 times
...
Loreen
11 months ago
I think we should try switching from CPU to GPU serving first.
upvoted 0 times
...

Save Cancel