Google Professional Machine Learning Engineer Exam - Topic 1 Question 96 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam

Question #: 96
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform across the various languages and without changing the serving infrastructure.

You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API. However, the model has significant differences in performance across the different languages. How should you improve it?

AAdd a regularization term such as the Min-Diff algorithm to the loss function.

BTrain a classifier using the chat messages in their original language.

CReplace the in-house word2vec with GPT-3 or T5.

DRemove moderation for languages for which the false positive rate is too high.

Show Suggested Answer

Suggested Answer: B

Vertex AI batch prediction is the most appropriate and efficient way to apply a pre-trained model like TensorFlow's SavedModel to a large dataset, especially for batch processing.

The Vertex AI batch prediction job works by exporting your dataset (in this case, historical data from BigQuery) to a suitable format (like Avro or CSV) and then processing it in Cloud Storage where the model is stored.

Avro format is recommended for large datasets as it is highly efficient for data storage and is optimized for read/write operations in Google Cloud, which is why option B is correct.

Option A suggests using BigQuery ML for inference, but it does not support running arbitrary TensorFlow models directly within BigQuery ML. Hence, BigQuery ML is not a valid option for this particular task.

Option C (exporting to CSV) is a valid alternative but is less efficient compared to Avro in terms of performance.

by Precious at Jan 25, 2025, 09:18 AM

Limited Time Offer

25%

Off

Get Premium Professional Machine Learning Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Dorthy

6 days ago

Not sure about that, B could miss context from translations.

upvoted 0 times

...

Ciara

12 days ago

I think option B makes the most sense. Original language is key.

upvoted 0 times

...

Lorriane

17 days ago

I feel like removing moderation for certain languages is not a good idea. It could lead to a lot of issues. Definitely not option D for me!

upvoted 0 times

...

Rosendo

23 days ago

Replacing word2vec with something like GPT-3 sounds tempting, but I wonder if it would really solve the performance issue across all languages. Option C might be risky.

upvoted 0 times

...

Dustin

28 days ago

I'm not entirely sure, but I think adding a regularization term could help with performance consistency across languages. Maybe option A?

upvoted 0 times

...

Fabiola

1 month ago

I remember we discussed the importance of training models on original language data to capture nuances better. So, option B seems like a solid choice.

upvoted 0 times

...

Iola

1 month ago

Hmm, this is a tricky one. I think I'll try option A - adding a regularization term like Min-Diff to the loss function. That should help balance the performance across languages.Liam: I'm a bit confused by this question. I'm not sure if I fully understand the problem or the different options. Maybe I'll go with option B and try training the classifier on the original language messages instead of the translations.Olivia: Ooh, option C looks interesting - replacing the in-house word2vec with a more powerful model like GPT-3 or T5. That could really boost the performance across the board.Ethan: I'm a bit hesitant about option D - removing moderation for languages with high false positive rates. That doesn't seem like a great long-term solution. I think I'll try one of the other options that focuses on improving the model itself.

upvoted 0 times

...

Golda

1 month ago

I'm a bit hesitant about option D - removing moderation for languages with high false positive rates. That doesn't seem like a great long-term solution. I think I'll try one of the other options that focuses on improving the model itself.

upvoted 0 times

...

Alyssa

1 month ago

Ooh, option C looks interesting - replacing the in-house word2vec with a more powerful model like GPT-3 or T5. That could really boost the performance across the board.

upvoted 0 times

...

Stephen

1 month ago

I'm a bit confused by this question. I'm not sure if I fully understand the problem or the different options. Maybe I'll go with option B and try training the classifier on the original language messages instead of the translations.

upvoted 0 times

...

Leota

1 month ago

Hmm, this is a tricky one. I think I'll try option A - adding a regularization term like Min-Diff to the loss function. That should help balance the performance across languages.

upvoted 0 times

...

Alease

6 months ago

This chat moderation task reminds me of that old saying - 'lost in translation' takes on a whole new meaning when millions of players are involved!

upvoted 0 times

...

Nieves

6 months ago

Replace the in-house word2vec with GPT-3 or T5? Sounds like a job for Optimus Prime!

upvoted 0 times

...

Ardella

6 months ago

I wouldn't recommend removing moderation for languages with high false positive rates. That could lead to unchecked toxicity in those communities. Better to keep trying to improve the model.

upvoted 0 times

Levi

5 months ago

I agree, removing moderation for languages with high false positive rates is not a good idea. We should keep working on improving the model.

upvoted 0 times

...

Sabrina

5 months ago

B) Train a classifier using the chat messages in their original language.

upvoted 0 times

...

Helene

5 months ago

A) Add a regularization term such as the Min-Diff algorithm to the loss function.

upvoted 0 times

...

Rosalind

6 months ago

Training a classifier directly on the original language messages is an intriguing idea. That way the model can learn the nuances of each language natively without relying on translations.

upvoted 0 times

Vilma

6 months ago

C) Replace the in-house word2vec with GPT-3 or T5.

upvoted 0 times

...

Marleen

6 months ago

B) Train a classifier using the chat messages in their original language.

upvoted 0 times

...

Peggy

6 months ago

A) Add a regularization term such as the Min-Diff algorithm to the loss function.

upvoted 0 times

...

Judy

7 months ago

Regularizing the model with the Min-Diff algorithm sounds like a good approach to balance the performance across languages. Interesting that the in-house word2vec is struggling - maybe GPT-3 or T5 could provide better text representations.

upvoted 0 times