Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 1 Question 112 Discussion

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL 'dataset.model', table user_features). How should you create the ML pipeline?
D) Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Cloud Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.
A) Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
B) Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
C) Create a Cloud Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.

Google Professional Data Engineer Exam - Topic 1 Question 112 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 112
Topic #: 1
[All Professional Data Engineer Questions]

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL 'dataset.model', table user_features). How should you create the ML pipeline?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Milly
6 months ago
D sounds complicated, not sure if it's necessary for individual predictions.
upvoted 0 times
...
Kate
6 months ago
C is a solid choice, but does it really meet the latency requirement?
upvoted 0 times
...
Malcom
6 months ago
I think B is better for security with the Authorized View.
upvoted 0 times
...
Fabiola
6 months ago
I agree with B, sharing the dataset is a smart move!
upvoted 0 times
...
Tayna
6 months ago
Option A seems straightforward, just filter by user ID.
upvoted 0 times
...
Clarinda
7 months ago
I feel like using Bigtable could be overkill for just one user’s prediction, but I remember it being mentioned as a way to handle larger datasets efficiently.
upvoted 0 times
...
Verlene
7 months ago
I practiced a similar question about using Cloud Dataflow, but I can't recall if it's the best option here since we need predictions for a single user.
upvoted 0 times
...
Katina
7 months ago
I think creating an Authorized View could be a good approach since it simplifies access control, but I’m not entirely sure how it impacts performance.
upvoted 0 times
...
Jesusa
7 months ago
I remember we discussed the importance of latency in serving predictions, but I'm not sure if adding a WHERE clause is enough to meet the 100 ms requirement.
upvoted 0 times
...
Jennifer
7 months ago
This is a tricky one. I'm leaning towards Option C with the Cloud Dataflow pipeline. That way, I can control the entire data processing flow and ensure the latency requirements are met. The Dataflow Worker role should give the application the necessary access.
upvoted 0 times
...
Melinda
8 months ago
Okay, I think I've got a strategy here. Option B with the Authorized View seems like the best approach. That way, I can grant the necessary permissions to the application service account and the query will be optimized for low latency predictions.
upvoted 0 times
...
Gail
8 months ago
Hmm, I'm a bit confused. The question mentions a REST API application, but it's not clear how that fits into the pipeline. I'll need to think through the different options and how they address the latency requirement.
upvoted 0 times
...
Frederic
8 months ago
This seems like a straightforward question, but I want to make sure I understand the requirements correctly. The key is to create a pipeline that can serve predictions for individual users with low latency.
upvoted 0 times
...
Precious
8 months ago
I think option D is the way to go. Using Cloud Dataflow to read predictions for all users and writing them to Cloud Bigtable will allow for efficient access to individual user predictions.
upvoted 0 times
...
Lashaun
9 months ago
I'm leaning towards option C. Creating a Cloud Dataflow pipeline using BigQueryIO seems like a scalable solution for serving predictions with low latency.
upvoted 0 times
...
Denny
9 months ago
I disagree, I believe option B is the best choice. Creating an Authorized View will provide the necessary access to the application service account.
upvoted 0 times
...
Leigha
10 months ago
You know, I was thinking the same thing as Avery. The Dataflow pipeline seems like the best way to ensure low latency and high performance, even with a large dataset. Option C is my pick.
upvoted 0 times
...
Avery
10 months ago
Hmm, I'm not sure about that. What if the dataset is really large? Won't that lead to performance issues? I'd go with Option C and use Dataflow to handle the reading and processing.
upvoted 0 times
Amina
9 months ago
I agree, using Dataflow for reading and processing will help with performance issues.
upvoted 0 times
...
Shawna
9 months ago
Option C seems like the best choice. Dataflow can handle large datasets efficiently.
upvoted 0 times
...
...
Ceola
10 months ago
I think we should go with option A. Adding a WHERE clause to the query seems like the most efficient way to serve predictions for individual user IDs.
upvoted 0 times
...
Marge
10 months ago
Option B is the way to go. Creating an Authorized View and sharing the dataset with the application service account is the most efficient and scalable solution here.
upvoted 0 times
Dalene
9 months ago
B) Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
upvoted 0 times
...
...

Save Cancel