Google Professional Machine Learning Engineer Exam - Topic 9 Question 23 Discussion
You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
D) Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes
A) Significantly increase the max_batch_size TensorFlow Serving parameter
B) Switch to the tensorflow-model-server-universal version of TensorFlow Serving
C) Significantly increase the max_enqueued_batches TensorFlow Serving parameter
Rusty
7 months agoLashawnda
7 months agoRoosevelt
8 months agoCherelle
8 months agoLoreen
8 months agoKimberlie
8 months agoJutta
8 months agoViola
8 months agoGalen
8 months agoLoreta
8 months agoCorrinne
8 months agoLeanora
8 months agoCarissa
8 months ago