Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Exam Professional-Machine-Learning-Engineer Topic 6 Question 66 Discussion

Actual exam question for Google's Google Professional Machine Learning Engineer exam
Question #: 66
Topic #: 6
[All Google Professional Machine Learning Engineer Questions]

You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator;

estimator = tf.estimator.DNNRegressor(

feature_columns=[YOUR_LIST_OF_FEATURES],

hidden_units-[1024, 512, 256],

dropout=None)

Your model performs well, but Just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You are willing to accept a small decrease in performance in order to reach the latency requirement Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?

Show Suggested Answer Hide Answer
Suggested Answer: D

Applying quantization to your SavedModel by reducing the floating point precision can help reduce the serving latency by decreasing the amount of memory and computation required to make a prediction. TensorFlow provides tools such as the tf.quantization module that can be used to quantize models and reduce their precision, which can significantly reduce serving latency without a significant decrease in model performance.


Contribute your Thoughts:

Shaquana
10 days ago
I agree, option D seems like the best choice here. Though I have to say, I'm a bit surprised the question didn't mention anything about using a TensorFlow Lite model for deployment. That's another common technique for improving serving latency, especially on mobile devices.
upvoted 0 times
...
Kent
11 days ago
You're both right. I think option D is the way to go. Reducing the floating-point precision to tf.float16 should significantly improve the serving latency, and it's a common technique used in production environments to meet latency requirements. Plus, the question states we're willing to accept a small decrease in performance, so this could be a good compromise.
upvoted 0 times
...
Malcom
12 days ago
I agree, option B doesn't seem like a wise choice. Retraining the model with a high dropout rate could lead to a big drop in performance, which we're trying to avoid. However, option D, applying quantization to the SavedModel, sounds promising. That could help reduce the model size and improve latency without sacrificing too much accuracy.
upvoted 0 times
...
Lachelle
13 days ago
Hmm, this is a tricky question. We need to find a way to reduce the serving latency without significantly impacting the model's performance. I'm not sure increasing the dropout rate to 0.8 is a good idea, as that could severely degrade the model's accuracy.
upvoted 0 times
...

Save Cancel