Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Machine Learning Engineer Exam - Topic 4 Question 85 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam
Question #: 85
Topic #: 4
[All Professional Machine Learning Engineer Questions]

You are developing a recommendation engine for an online clothing store. The historical customer transaction data is stored in BigQuery and Cloud Storage. You need to perform exploratory data analysis (EDA), preprocessing and model training. You plan to rerun these EDA, preprocessing, and training steps as you experiment with different types of algorithms. You want to minimize the cost and development effort of running these steps as you experiment. How should you configure the environment?

Show Suggested Answer Hide Answer
Suggested Answer: A

Cost-effectiveness:User-managed notebooks in Vertex AI Workbench allow you to leverage pre-configured virtual machines with reasonable resource allocation, keeping costs lower compared to options involving managed notebooks or Dataproc clusters.

Development flexibility:User-managed notebooks offer full control over the environment, allowing you to install additional libraries or dependencies needed for your specific EDA, preprocessing, and model training tasks. This flexibility is crucial while experimenting with different algorithms.

BigQuery integration:The %%bigquery magic commands provide seamless integration with BigQuery within the Jupyter Notebook environment. This enables efficient querying and exploration of customer transaction data stored in BigQuery directly from the notebook, streamlining the workflow.

Other options and why they are not the best fit:

B) Managed notebook:While managed notebooks offer an easier setup, they might have limited customization options, potentially hindering your ability to install specific libraries or tools.

C) Dataproc Hub:Dataproc Hub focuses on running large-scale distributed workloads, and it might be overkill for your scenario involving exploratory analysis and experimentation with different algorithms. Additionally, it could incur higher costs compared to a user-managed notebook.

D) Dataproc cluster with spark-bigquery-connector:Similar to option C, using a Dataproc cluster with the spark-bigquery-connector would be more complex and potentially more expensive than using %%bigquery magic commands within a user-managed notebook for accessing BigQuery data.


https://cloud.google.com/vertex-ai/docs/workbench/instances/bigquery

https://cloud.google.com/vertex-ai-notebooks

Contribute your Thoughts:

0/2000 characters
Tamie
6 days ago
I think A is better for more control over the environment.
upvoted 0 times
...
Doyle
12 days ago
Option B seems the easiest for quick queries.
upvoted 0 times
...
Abraham
18 days ago
I feel like using Dataproc could be overkill for this task, but the spark-bigquery-connector might be useful if we want to scale later on.
upvoted 0 times
...
Albina
23 days ago
I practiced a similar question where we had to choose between user-managed and managed notebooks, and I think user-managed might give more control over the environment.
upvoted 0 times
...
Vesta
28 days ago
I think using the %%bigquery magic commands could be really helpful for querying directly, but I can't recall if that works in managed notebooks.
upvoted 0 times
...
Salena
1 month ago
I remember we discussed the benefits of using managed notebooks for easier setup, but I'm not sure if that's the best choice here.
upvoted 0 times
...
Aaron
1 month ago
I'm feeling pretty confident about this one. Option B seems like the most straightforward and cost-effective approach, since I can directly access the data sources from the JupyterLab interface without having to worry about managing the underlying infrastructure.
upvoted 0 times
...
Noemi
1 month ago
Option D looks interesting - using a Vertex AI Workbench managed notebook on a Dataproc cluster and the spark-bigquery-connector to access the data. That could be a good way to leverage the power of Spark for the data processing and model training steps.
upvoted 0 times
...
Afton
1 month ago
Hmm, I'm a bit confused about the differences between the options. I'm not sure if using a user-managed notebook or a Dataproc Hub would be more efficient for this use case. Maybe I should review the details of each option more carefully.
upvoted 0 times
...
Theresia
1 month ago
This seems like a straightforward question about setting up the right environment for exploratory data analysis and model training. I think I'll go with option B - creating a Vertex AI Workbench managed notebook to directly access the tables in BigQuery and Cloud Storage.
upvoted 0 times
...
Evelynn
2 months ago
I've seen this issue before. I'd recommend going with option A and splitting the trigger logic into two separate triggers.
upvoted 0 times
...
Lasandra
1 year ago
I dunno, man, all these options sound like a lot of work. Can't we just have a button that says 'Make me a recommendation engine' and it just does it all for us? Where's the AI in all this?
upvoted 0 times
...
Micah
1 year ago
Option D, hands down. Anything that involves Dataproc is bound to be a pain in the neck. I'll take the managed notebook and spark-bigquery-connector any day!
upvoted 0 times
Malcom
1 year ago
I agree with option D. Using the spark-bigquery-connector on a Dataproc cluster seems like a solid choice for this scenario.
upvoted 0 times
...
Denae
1 year ago
I think option B is the way to go. It's convenient to browse and query the tables directly from the JupyterLab interface.
upvoted 0 times
...
Ashlee
1 year ago
I prefer option A. It's simpler to just use the default VM instance and the %%bigquery magic commands in Jupyter.
upvoted 0 times
...
Sherell
1 year ago
Option D, hands down. Anything that involves Dataproc is bound to be a pain in the neck. I'll take the managed notebook and spark-bigquery-connector any day!
upvoted 0 times
...
...
Deeann
1 year ago
Hmm, I'm not sure any of these options are truly optimal. If I had to choose, I'd probably go with B, but I can't help but feel like there's a more elegant solution out there that would really streamline the whole process.
upvoted 0 times
...
Bernadine
1 year ago
Wow, these options are all over the place! I'm torn between A and C, but I think I'd lean towards C to get the benefits of Dataproc without the added complexity of managing a separate Dataproc cluster.
upvoted 0 times
Charlene
1 year ago
I agree, C seems like a good balance between functionality and simplicity.
upvoted 0 times
...
Candra
1 year ago
C sounds like a good option to leverage Dataproc without the extra hassle of managing a separate cluster.
upvoted 0 times
...
Holley
1 year ago
I think A could work well for quick querying with the %%bigquery magic commands.
upvoted 0 times
...
...
Delisa
1 year ago
I think option C is the way to go, as it provides a user-managed notebook on a Dataproc Hub for querying the tables.
upvoted 0 times
...
Virgie
1 year ago
I prefer option B because it allows us to browse and query the tables directly from the JupyterLab interface.
upvoted 0 times
...
Ressie
1 year ago
I disagree, I believe option D is more efficient as it utilizes the spark-bigquery-connector to access the tables.
upvoted 0 times
...
Emerson
1 year ago
I'd go with Option D. Using the spark-bigquery-connector on a Dataproc cluster seems like the most efficient way to handle the large datasets and complex analysis required.
upvoted 0 times
Dorcas
1 year ago
I agree, using the spark-bigquery-connector on a Dataproc cluster seems like the way to go.
upvoted 0 times
...
Christoper
1 year ago
Option D sounds like a good choice. It's efficient for handling large datasets.
upvoted 0 times
...
...
Graciela
1 year ago
Option B makes the most sense, as it allows me to directly access the tables from the JupyterLab interface, which should minimize the setup and configuration overhead.
upvoted 0 times
Noel
1 year ago
That sounds like a good choice for minimizing the cost and development effort while experimenting with different algorithms.
upvoted 0 times
...
Rashad
1 year ago
I agree, using a Vertex AI Workbench managed notebook for browsing and querying tables in JupyterLab is a convenient option.
upvoted 0 times
...
Kimi
1 year ago
Option B makes the most sense, as it allows me to directly access the tables from the JupyterLab interface, which should minimize the setup and configuration overhead.
upvoted 0 times
...
...
Bettyann
1 year ago
I think option A is the best choice because it allows us to query the tables using %%bigquery magic commands in Jupyter.
upvoted 0 times
...

Save Cancel