New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 2 Question 62 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 62
Topic #: 2
[All Professional Data Engineer Questions]

You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Junita
4 months ago
Preemptible VMs are great for saving costs, but are they reliable enough?
upvoted 0 times
...
Fausto
4 months ago
Not sure if switching formats to TFRecords is worth it...
upvoted 0 times
...
Edgar
4 months ago
Increasing parquet file size to 1 GB could help with performance.
upvoted 0 times
...
Yuonne
4 months ago
Definitely agree, switching to SSDs is a smart move.
upvoted 0 times
...
Mohammad
4 months ago
I heard SSDs can really speed things up!
upvoted 0 times
...
Josephine
5 months ago
I feel like copying data to HDFS might add complexity, but it could also speed things up. Not sure if it's worth it with the cost constraints.
upvoted 0 times
...
Vincenza
5 months ago
I practiced a similar question where file format made a difference, but I can't recall if TFRecords would actually be better than parquet in this case.
upvoted 0 times
...
Kris
5 months ago
I think switching to SSDs could improve performance, but I’m not clear on how that interacts with preemptible VMs.
upvoted 0 times
...
Markus
5 months ago
I remember reading that increasing the size of parquet files can help with performance, but I'm not sure if 1 GB is the right target.
upvoted 0 times
...
Latia
5 months ago
Wait, I'm a bit confused. How do I use the information about the total number of students and the number of French students to find the probability that a French student is female? I need to review the conditional probability formula.
upvoted 0 times
...
Eileen
5 months ago
Okay, let me break this down. The heat map is used to visualize and prioritize risks, so I think the answer is B - control monitoring. That would allow the organization to focus on the highest-risk areas.
upvoted 0 times
...
Ling
5 months ago
I feel pretty good about the filtering and inquiry capabilities, but the other details are a bit fuzzy. I'll make sure to double-check those in the exam.
upvoted 0 times
...
Janessa
5 months ago
This looks like a tricky one. I'll need to carefully read through the options and think about the best way to capture the PII data securely while preventing it from leaking to Stackdriver.
upvoted 0 times
...

Save Cancel