New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Machine Learning Engineer Exam - Topic 1 Question 77 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam
Question #: 77
Topic #: 1
[All Professional Machine Learning Engineer Questions]

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex Al Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

The best option to minimize storage and computational overhead is to use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics. The TRANSFORM clause allows you to specify feature preprocessing logic that applies to both training and prediction. The preprocessing logic is executed in the same query as the model creation, which avoids the need to create and store intermediate tables. The TRANSFORM clause also supports quantile bucketization and MinMax scaling, which are the preprocessing steps required for this scenario. Option A is incorrect because creating a component in the Vertex AI Pipelines DAG to calculate the required statistics may increase the computational overhead, as the component needs to run separately from the model creation. Moreover, the component needs to pass the statistics to subsequent components, which may increase the storage overhead. Option B is incorrect because preprocessing and staging the data in BigQuery prior to feeding it to the model may also increase the storage and computational overhead, as you need to create and maintain additional tables for the preprocessed data. Moreover, you need to ensure that the preprocessing logic is consistent for both training and inference. Option C is incorrect because creating SQL queries to calculate and store the required statistics in separate BigQuery tables may also increase the storage and computational overhead, as you need to create and maintain additional tables for the statistics. Moreover, you need to ensure that the statistics are updated regularly to reflect the new data.Reference:

BigQuery ML documentation

Using the TRANSFORM clause

Feature preprocessing with BigQuery ML


Contribute your Thoughts:

0/2000 characters
Herman
3 months ago
Not sure about A, seems like it could complicate the pipeline.
upvoted 0 times
...
Laurel
3 months ago
Wait, can you really minimize overhead with option D? Sounds risky!
upvoted 0 times
...
Catherin
3 months ago
C seems like a good way to keep things organized too.
upvoted 0 times
...
Lilli
4 months ago
Totally agree with B, staging data is key!
upvoted 0 times
...
Tawanna
4 months ago
I think option B makes the most sense for preprocessing.
upvoted 0 times
...
Stanton
4 months ago
Option D sounds interesting, but I’m unsure if using the TRANSFORM clause directly in the CREATE MODEL statement is the best practice for this scenario.
upvoted 0 times
...
Glennis
4 months ago
I'm leaning towards option C because storing statistics in separate tables could help with organization, but I wonder if it adds too much complexity.
upvoted 0 times
...
Alex
4 months ago
I remember practicing a question similar to this where preprocessing in BigQuery was emphasized. Option B might be the way to go for minimizing overhead.
upvoted 0 times
...
Lili
5 months ago
I think option A makes sense since it allows for dynamic calculation of statistics within the pipeline, but I'm not entirely sure if it's the most efficient approach.
upvoted 0 times
...
Tamekia
5 months ago
I like the idea of using the TRANSFORM clause in Option D. That could help keep the logic centralized and reduce the need for separate data processing steps.
upvoted 0 times
...
Valentin
5 months ago
Option B seems like the most straightforward approach to me. Preprocessing the data in BigQuery first would help minimize the overhead during training and inference.
upvoted 0 times
...
Aaron
5 months ago
Hmm, I'm a bit confused by the question. I'll need to re-read it a few times to make sure I understand the requirements.
upvoted 0 times
...
Dwight
5 months ago
This looks like a tricky one. I'll need to carefully think through the options and consider the trade-offs.
upvoted 0 times
...
Tawny
5 months ago
I remember doing a practice question where we discussed what gets accepted or rejected, and that might align with option C, but I need to double-check.
upvoted 0 times
...
Renea
5 months ago
I'm pretty confident I know the answer to this. Parsing can definitely happen in both HF and UF, so I'll go with A.
upvoted 0 times
...
Glendora
5 months ago
I'm a bit confused about the difference between "divest" and "harvest" strategies. I'll need to review those concepts before deciding.
upvoted 0 times
...
Shawnee
5 months ago
I'm leaning towards the idea that we should rebuild the activity, although I wonder if that's always necessary when the structure changes.
upvoted 0 times
...

Save Cancel