New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 4 Question 79 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 79
Topic #: 4
[All Professional Data Engineer Questions]

You want to create a machine learning model using BigQuery ML and create an endpoint foe hosting the model using Vertex Al. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.


Contribute your Thoughts:

0/2000 characters
Gracia
3 months ago
Wait, can we really trust the data if it's coming from multiple vendors?
upvoted 0 times
...
Gilma
3 months ago
Totally agree with D, Dataflow is great for processing!
upvoted 0 times
...
Leatha
4 months ago
A seems a bit off, why create a new dataset?
upvoted 0 times
...
Louis
4 months ago
I think B is simpler, just stream directly to the dataset.
upvoted 0 times
...
Cletus
4 months ago
Option D sounds solid for sanitizing data before it hits BigQuery.
upvoted 0 times
...
Martina
4 months ago
I recall that using streaming inserts directly into BigQuery was a common approach, but I wonder if it handles invalid values as well as Dataflow does.
upvoted 0 times
...
Lynelle
4 months ago
I'm a bit confused about whether we should create a new dataset or use the existing one for the ML model. I feel like both options could work.
upvoted 0 times
...
Viola
5 months ago
I think option D sounds familiar because it mentions using Dataflow to process the data, which we practiced in a similar question.
upvoted 0 times
...
Bernadine
5 months ago
I remember we discussed using Pub/Sub for handling streaming data, but I'm not sure if it was specifically for sanitizing invalid values.
upvoted 0 times
...
Gilbert
5 months ago
I'm feeling pretty confident about this one. I think Option A is the way to go - create a new BigQuery dataset for the vendor data, and then use that as the training data for the BigQuery ML model. That should give me the flexibility to handle the invalid values and the continuous streaming requirement.
upvoted 0 times
...
Dulce
5 months ago
Okay, I think I've got a handle on this. The key is to use a streaming solution to ingest the data from the vendors, and then process and sanitize it before storing it in BigQuery. Option D looks like the best approach to me, with Pub/Sub and Dataflow.
upvoted 0 times
...
Nana
5 months ago
I'm a bit confused by the part about processing continuous streaming data in near-real time. I'm not sure if the options provided fully address that requirement. Maybe I should consider using a streaming solution like Pub/Sub and Dataflow to process and sanitize the data before storing it in BigQuery.
upvoted 0 times
...
Howard
5 months ago
Hmm, this looks like a tricky one. I think I'd start by creating a new BigQuery dataset to land the data from the multiple vendors. That way, I can configure the BigQuery ML model to use that dataset as the training data.
upvoted 0 times
...
Thurman
5 months ago
This seems straightforward - I think the key is setting the creation indicator for delivery schedule lines on the initial MRP run screen.
upvoted 0 times
...
Ranee
5 months ago
I might be overthinking it, but I wonder if there could be some exceptions regarding project management in MaxCompute.
upvoted 0 times
...
Macy
5 months ago
Hmm, this is a good test of our understanding of data privacy laws. I'm going to carefully consider each option and try to apply the relevant principles.
upvoted 0 times
...
Van
2 years ago
C) But processing data through Cloud Function offers more control and flexibility in data processing, don't you think?
upvoted 0 times
...
Miles
2 years ago
A) True, using an 'ingestion' dataset for training data could help in handling invalid values.
upvoted 0 times
...
Dino
2 years ago
D) Using Dataflow to process and sanitize data before streaming it to BigQuery seems like a reliable option.
upvoted 0 times
...
Steffanie
2 years ago
C) I think creating a Pub/Sub topic and using Cloud Function to process data might be more efficient.
upvoted 0 times
...
Veda
2 years ago
B) But wouldn't it be better to use BigQuery streaming inserts directly into the ML model deployed dataset?
upvoted 0 times
...
Gwenn
2 years ago
A) Create a new BigQuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the 'ingestion' dataset as the training data.
upvoted 0 times
...
Ricki
2 years ago
What about using Cloud Functions to process the data instead?
upvoted 0 times
...
Truman
2 years ago
I agree with Mari, processing and sanitizing the data before streaming it to BigQuery seems like a better approach.
upvoted 0 times
...
Mari
2 years ago
I disagree, I believe we should create a Pub/Sub topic and use Dataflow to process and sanitize the data.
upvoted 0 times
...
Lenny
2 years ago
I think we should use BigQuery streaming inserts to land the data.
upvoted 0 times
...
Helene
2 years ago
You know, I was initially considering option C, but I think Dataflow might be a better choice here. It's designed for high-throughput, real-time data processing, which sounds like exactly what we need for this use case.
upvoted 0 times
...
Stefania
2 years ago
Hmm, I'm leaning towards option D. Using Pub/Sub to ingest the data and then leveraging Dataflow to process and sanitize it before streaming to BigQuery seems like a robust and scalable solution. Plus, Dataflow can handle the data transformation and cleaning, which is crucial given the potential for invalid values.
upvoted 0 times
Millie
2 years ago
I agree. It's important to ensure the data is clean before it goes into the ML model.
upvoted 0 times
...
Dominque
2 years ago
D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
upvoted 0 times
...
Dyan
2 years ago
Yeah, Dataflow is great for handling data processing tasks.
upvoted 0 times
...
Niesha
2 years ago
D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
upvoted 0 times
...
Ozell
2 years ago
That sounds like a solid plan. Dataflow can handle the data cleaning and transformation efficiently.
upvoted 0 times
...
Andree
2 years ago
D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
upvoted 0 times
...
...
Martha
2 years ago
Haha, for real! Trying to manually process all that vendor data would be a nightmare. Dataflow is definitely the way to go here. Plus, it integrates nicely with Pub/Sub and BigQuery, so the entire pipeline will be neatly tied together.
upvoted 0 times
...
Doyle
2 years ago
This question seems to be testing our understanding of real-time data processing and model deployment on Vertex AI. The key here is to identify the most efficient and scalable solution that can handle continuous streaming data from multiple vendors, while also addressing the issue of invalid values.
upvoted 0 times
...

Save Cancel