Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 2 Question 104 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 104
Topic #: 2
[All Professional Data Engineer Questions]

You want to migrate an Apache Spark 3 batch job from on-premises to Google Cloud. You need to minimally change the job so that the job reads from Cloud Storage and writes the result to BigQuery. Your job is optimized for Spark, where each executor has 8 vCPU and 16 GB memory, and you want to be able to choose similar settings. You want to minimize installation and management effort to run your job. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Jerry
6 days ago
I think B is better for less management hassle.
upvoted 0 times
...
Mi
12 days ago
A Dataproc cluster is a solid choice for Spark jobs.
upvoted 0 times
...
Raylene
17 days ago
I’m leaning towards option D with Compute Engine, but I recall that it might involve more installation and management than using Dataproc.
upvoted 0 times
...
Laura
23 days ago
I practiced a similar question where Kubernetes was mentioned, but I feel like it might require more setup than I want for this migration.
upvoted 0 times
...
Ula
28 days ago
I think I read somewhere that Dataproc Serverless can help minimize management effort, which makes option B appealing, but I need to double-check the performance aspects.
upvoted 0 times
...
Erasmo
1 month ago
I remember that Dataproc is designed for running Spark jobs, so option A seems like a good fit, but I'm not sure if it's the most efficient choice.
upvoted 0 times
...
Nieves
1 month ago
I'm a little confused by the options here. Executing the job in a new Dataproc cluster (A) seems like it could work, but then we'd have to manage the cluster ourselves. And running it on a Compute Engine VM (D) doesn't seem to take advantage of the managed services that Google Cloud offers. I think I'm leaning towards the Dataproc Serverless option (B), but I'll need to double-check the details.
upvoted 0 times
...
Armando
1 month ago
Okay, I got this. The question is asking us to migrate the Spark job to Google Cloud, and it wants us to minimize installation and management effort. Based on that, I'd say the Dataproc Serverless option (B) is the way to go. It should let us run the Spark job with similar resource settings without having to manage the cluster ourselves.
upvoted 0 times
...
Natalie
1 month ago
Hmm, I'm a bit unsure about this one. There are a few options presented, and I'm not sure which one would be the best fit. I'll need to think through the tradeoffs of each approach.
upvoted 0 times
...
Jamika
1 month ago
This looks like a straightforward Spark migration question. I think the key is to find the Google Cloud service that can most closely match the on-premises Spark setup with minimal changes.
upvoted 0 times
...

Save Cancel