Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Machine Learning Engineer Exam - Topic 1 Question 98 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam
Question #: 98
Topic #: 1
[All Professional Machine Learning Engineer Questions]

You trained a model on data stored in a Cloud Storage bucket. The model needs to be retrained frequently in Vertex AI Training using the latest data in the bucket. Data preprocessing is required prior to retraining. You want to build a simple and efficient near-real-time ML pipeline in Vertex AI that will preprocess the data when new data arrives in the bucket. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: B

Cloud Run can be triggered on new data arrivals, which makes it ideal for near-real-time processing. The function then initiates the Vertex AI Pipeline for preprocessing and storing features in Vertex AI Feature Store, aligning with the retraining needs. Cloud Scheduler (Option A) is suitable for scheduled jobs, not event-driven triggers. Dataflow (Option C) is better suited for batch processing or ETL rather than ML preprocessing pipelines.


Contribute your Thoughts:

0/2000 characters
Raina
6 days ago
I think A is better for scheduling, though.
upvoted 0 times
...
Lettie
12 days ago
Option B sounds efficient with the Cloud Run trigger!
upvoted 0 times
...
Rebbeca
17 days ago
I recall that using the Vertex AI SDK directly for preprocessing could be efficient, but I’m not sure if doing it before each retraining is the right approach. Option D seems a bit risky.
upvoted 0 times
...
Winifred
23 days ago
I’m a bit confused about whether we should use Feature Store or BigQuery for storing processed features. I feel like I leaned towards option A or B in our study sessions.
upvoted 0 times
...
Eladia
28 days ago
I think we practiced a similar question where we had to decide between using Dataflow and Vertex AI SDK. I feel like option C might be overkill for just preprocessing.
upvoted 0 times
...
Matthew
1 month ago
I remember we discussed using Cloud Run for event-driven tasks, so option B sounds familiar, but I'm not entirely sure if it's the best choice for preprocessing.
upvoted 0 times
...
Verona
1 month ago
I'm a bit torn between options B and D. Both of them involve using the Vertex AI SDK, which I'm comfortable with. But I'm not sure if storing the processed features in BigQuery is the best approach, since we're specifically asked to use the Vertex AI Feature Store. Maybe option B is the way to go, since it seems to align more closely with the requirements of the question.
upvoted 0 times
...
Helga
1 month ago
Okay, I think I've got this. The key here is that we need a near-real-time pipeline, so we want something that will automatically process the new data as soon as it arrives in the bucket. Option B with the Cloud Run function seems like the best way to do that. It's a nice, elegant solution that ties everything together nicely. I'm pretty confident that's the right approach.
upvoted 0 times
...
Antonette
1 month ago
Hmm, I'm a bit confused here. There are a few different options, and I'm not sure which one is the most efficient. I like the idea of using a Vertex AI pipeline, but I'm not sure if a Cloud Run function is the best way to trigger it. Maybe option A, where we schedule the pipeline with Cloud Scheduler, would be a simpler and more reliable approach?
upvoted 0 times
...
Keva
1 month ago
This seems like a pretty straightforward question. I think option B is the way to go - using a Cloud Run function to trigger a Vertex AI pipeline to preprocess the new data and store it in the Feature Store. That way, the pipeline will run automatically whenever new data arrives, which is exactly what we need for a near-real-time ML pipeline.
upvoted 0 times
...
Mike
8 months ago
I see the benefits of option C as well. Building a Dataflow pipeline to preprocess data and store features in BigQuery could be a solid choice.
upvoted 0 times
...
Doug
8 months ago
Option F: Hire a team of psychic interns to monitor the bucket and trigger the pipeline whenever they sense new data. It's the future of MLOps!
upvoted 0 times
...
Ernest
8 months ago
Option E: Write a Python script that uses a Ouija board to divine the latest data and automatically retrain the model. It's foolproof!
upvoted 0 times
Carri
7 months ago
Option E: Write a Python script that uses a Ouija board to divine the latest data and automatically retrain the model. It's foolproof!
upvoted 0 times
...
Wynell
7 months ago
B) Create a Cloud Run function that is triggered when new data arrives in the bucket. The function initiates a Vertex AI Pipeline to preprocess the new data and store the processed features in Vertex AI Feature Store.
upvoted 0 times
...
Dalene
7 months ago
A) Create a pipeline using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.
upvoted 0 times
...
...
Amie
8 months ago
Option A is a good backup plan, but it requires additional scheduling and coordination. Option B just seems like the most straightforward and elegant solution here.
upvoted 0 times
Joesph
7 months ago
Yeah, Option B seems like the most elegant way to build a near-real-time ML pipeline.
upvoted 0 times
...
Jolanda
7 months ago
I agree, Option B with Cloud Run function sounds like the most straightforward solution.
upvoted 0 times
...
Charlesetta
8 months ago
I think Option B is the way to go. It's simple and efficient.
upvoted 0 times
...
...
Cassie
8 months ago
I prefer option B. Using a Cloud Run function to trigger a Vertex AI Pipeline sounds more straightforward to me.
upvoted 0 times
...
Virgina
8 months ago
Hmm, Option D seems a bit outdated. Preprocessing the data before each retraining seems like a lot of manual work. I'd prefer a more automated solution like Option B.
upvoted 0 times
...
Cherrie
8 months ago
I'm not sure about Option C. Configuring a cron job to trigger a Dataflow pipeline seems a bit overkill for this use case. Why not just use Vertex AI's built-in capabilities?
upvoted 0 times
Eve
7 months ago
D) Use the Vertex AI SDK to preprocess the new data in the bucket prior to each model retraining. Store the processed features in BigQuery.
upvoted 0 times
...
Han
8 months ago
B) Create a Cloud Run function that is triggered when new data arrives in the bucket. The function initiates a Vertex AI Pipeline to preprocess the new data and store the processed features in Vertex AI Feature Store.
upvoted 0 times
...
Tammara
8 months ago
A) Create a pipeline using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.
upvoted 0 times
...
...
Tula
8 months ago
I agree with Jackie. Storing the processed features in Vertex AI Feature Store seems like a good idea for efficiency.
upvoted 0 times
...
Nieves
8 months ago
Option B looks like the most efficient solution. Triggering a pipeline when new data arrives in the bucket is a great way to keep the model up-to-date in near-real-time.
upvoted 0 times
Javier
8 months ago
A) Create a pipeline using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.
upvoted 0 times
...
Dallas
8 months ago
B) Create a Cloud Run function that is triggered when new data arrives in the bucket. The function initiates a Vertex AI Pipeline to preprocess the new data and store the processed features in Vertex AI Feature Store.
upvoted 0 times
...
...
Jackie
8 months ago
I think option A is the best choice. It allows us to create a pipeline using Vertex AI SDK and schedule it with Cloud Scheduler.
upvoted 0 times
...

Save Cancel