New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 1 Question 84 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 84
Topic #: 1
[All Professional Data Engineer Questions]

You work for a large real estate firm and are preparing 6 TB of home sales data lo be used for machine learning You will use SOL to transform the data and use BigQuery ML lo create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

Show Suggested Answer Hide Answer
Suggested Answer: A

https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning


Contribute your Thoughts:

0/2000 characters
Janna
3 months ago
D is solid too, but Dataflow might be overkill for this.
upvoted 0 times
...
Dortha
3 months ago
C seems a bit off, why not just preprocess everything upfront?
upvoted 0 times
...
Sylvie
3 months ago
Wait, can you really skip transformations at prediction time? Sounds risky.
upvoted 0 times
...
Myrtie
4 months ago
Definitely agree with B, saves a lot of hassle later!
upvoted 0 times
...
Carlee
4 months ago
I think option B makes the most sense for consistent preprocessing.
upvoted 0 times
...
Lenna
4 months ago
I feel like option D could be a good choice since Dataflow is powerful for preprocessing, but I’m unsure if it aligns with the requirement to avoid transformations at prediction time.
upvoted 0 times
...
Barb
4 months ago
I vaguely recall that using a saved query for transformations could help maintain consistency, but I'm not entirely confident about how that fits with the ML.EVALUATE clause.
upvoted 0 times
...
Kanisha
4 months ago
I think we practiced a similar question where we had to decide between transforming data before or after model training. I feel like option B makes sense because it emphasizes transforming the raw data before predictions.
upvoted 0 times
...
Tuyet
5 months ago
I remember we discussed the importance of preprocessing in our last class, but I'm not sure if using the TRANSFORM clause is the best approach for predictions.
upvoted 0 times
...
Alesia
5 months ago
Definitely don't want to skip the preprocessing steps at prediction time. That would definitely lead to skew. I'd go with option A or C to make sure the transformations are applied consistently.
upvoted 0 times
...
Marleen
5 months ago
I'm a bit confused about the different options here. I'm not sure if I should use the TRANSFORM clause, a saved query, or a view to handle the preprocessing.
upvoted 0 times
...
Leatha
5 months ago
Okay, let me see if I've got this right. We need to make sure the preprocessing steps used to create the model are the same as what's applied to the raw input data at prediction time, right?
upvoted 0 times
...
Katie
5 months ago
Hmm, this is a tricky one. I'll need to think carefully about how to set up the workflow to avoid skew at prediction time.
upvoted 0 times
...
Bea
5 months ago
I think option B might be the way to go - using the TRANSFORM clause to define the preprocessing steps, and then applying the same transformations to the raw input data before making predictions. That way, we can ensure consistency.
upvoted 0 times
...
Janine
5 months ago
Okay, I've got this. The three methods to filter Microsoft 365 roadmap items are Cloud instance, Region, and Licensing type. I'm feeling good about this one.
upvoted 0 times
...
Willis
5 months ago
This is a tricky one. I'm not sure if I fully understand the differences between the WebLogic Web Services versions or the resource adapter class file support. I'll need to think this through step-by-step.
upvoted 0 times
...
Henriette
5 months ago
Based on my understanding, the Diagnostic and Tuning Packs are advanced tools that require a higher-level Oracle license. So the answer must be C, Oracle Enterprise Edition.
upvoted 0 times
...
Tonette
2 years ago
Whoa, hold up, this question's got me feeling like a real estate mogul! I'm gonna go with Option B and keep my data transformations consistent. Gotta stay on top of that skew, am I right?
upvoted 0 times
...
Teddy
2 years ago
Option A looks like the real estate agent's choice - let BigQuery do all the heavy lifting! But hey, if it works, it works, right?
upvoted 0 times
Ozell
2 years ago
C) Use a BigQuery to define your preprocessing logic. When creating your model, use the view as your model training data. At prediction time, use BigQuery's ML EVALUATE clause without specifying any transformations on the raw input data.
upvoted 0 times
...
Michell
2 years ago
B) When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps. Before requesting predictions, use a saved query to transform your raw input data, and then use ML. EVALUATE.
upvoted 0 times
...
Omega
2 years ago
A) When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps. At prediction time, use BigQuery's ML. EVALUATE clause without specifying any transformations on the raw input data.
upvoted 0 times
...
...
Tonette
2 years ago
Option D, for sure. Preprocessing everything in Dataflow and then letting BigQuery handle the predictions? That's the kind of workflow that keeps things clean and streamlined.
upvoted 0 times
Bette
2 years ago
I agree, using Dataflow for preprocessing and BigQuery for predictions seems like a solid workflow.
upvoted 0 times
...
Alyssa
2 years ago
Option D sounds like the best approach. Dataflow can handle the preprocessing efficiently.
upvoted 0 times
...
Twila
2 years ago
D) Preprocess all data using Dataflow. At prediction time, use BigQuery's ML. EVALUATE clause without specifying any further transformations on the input data.
upvoted 0 times
...
...
Hannah
2 years ago
I'm all about Option C. Using a view for the model training data and then just evaluating the raw input at prediction time? Now that's what I call efficiency.
upvoted 0 times
Kassandra
2 years ago
Definitely, it simplifies the process and reduces the risk of skew at prediction time.
upvoted 0 times
...
Vanna
2 years ago
I agree, it seems like a more efficient workflow. Just evaluate the raw input at prediction time.
upvoted 0 times
...
Ozell
2 years ago
Option C sounds like the way to go. Using a view for training data is a smart move.
upvoted 0 times
...
...
Janella
2 years ago
Option B is the way to go, my dude. Gotta make sure that the preprocessing steps are the same for both training and prediction to avoid that pesky skew.
upvoted 0 times
Derrick
2 years ago
B) Option B sounds solid. Consistency in preprocessing is key to avoiding skew in predictions.
upvoted 0 times
...
Paz
2 years ago
A) When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps Before requesting predictions, use a saved query to transform your raw input data, and then use ML. EVALUATE
upvoted 0 times
...
Annalee
2 years ago
B) Option B is definitely the best choice. Consistency in preprocessing steps is key to avoiding skew in predictions.
upvoted 0 times
...
Annabelle
2 years ago
Yeah, you're right. Consistency in preprocessing is key to accurate predictions.
upvoted 0 times
...
Glynda
2 years ago
A) When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps Before requesting predictions, use a saved query to transform your raw input data, and then use ML. EVALUATE
upvoted 0 times
...
Denae
2 years ago
Option B is the way to go, my dude. Gotta make sure that the preprocessing steps are the same for both training and prediction to avoid that pesky skew.
upvoted 0 times
...
...

Save Cancel