Google Exam Professional Machine Learning Engineer Topic 5 Question 75 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam

Question #: 75
Topic #: 5

[All Professional Machine Learning Engineer Questions]

You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator;

estimator tf.estimator.DNNRegressor(

feature_columns[YOUR_LIST_OF_FEATURES],

hidden_units-[1024, 512, 256],

dropoutNone)

Your model performs well, but Just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You are willing to accept a small decrease in performance in order to reach the latency requirement Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?

AIncrease the dropout rate to 0.8 in_PREDICT mode by adjusting the TensorFlow Serving parameters

BIncrease the dropout rate to 0.8 and retrain your model.

CSwitch from CPU to GPU serving

DApply quantization to your SavedModel by reducing the floating point precision to tf.float16.

Show Suggested Answer

Suggested Answer: D

by Mary at Apr 24, 2024, 07:30 PM

Limited Time Offer

25%

Off

Get Premium Professional Machine Learning Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Theodora

2 months ago

Quantization is the way to go here. Reducing the precision to tf.float16 should give us a nice latency boost without sacrificing too much model performance. And hey, at least we're not trying to run it on a Commodore 64, right?

upvoted 0 times

...

2 months ago

Switching to GPU serving is a good option, but it might be overkill for the 8ms latency requirement. I'd try the quantization approach first - reducing the precision to tf.float16 could give us a quicker win.

upvoted 0 times

Emilio

29 days ago

That's a good point. Let's try reducing the precision first and see if that helps with the latency.

upvoted 0 times

...

Rosio

1 months ago

D) Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.

upvoted 0 times

...

Marilynn

2 months ago

C) Switch from CPU to GPU serving

upvoted 0 times

...

4 months ago

I think we should try switching from CPU to GPU serving first.

upvoted 0 times

...