You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
Rusty
4 months agoLashawnda
4 months agoRoosevelt
4 months agoCherelle
4 months agoLoreen
5 months agoKimberlie
5 months agoJutta
5 months agoViola
5 months agoGalen
5 months agoLoreta
5 months agoCorrinne
5 months agoLeanora
5 months agoCarissa
5 months ago