Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 2 Question 62 Discussion

You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?
A) Switch from HODs to SSDs override the preemptible VMs configuration to increase the boot disk size
B) Increase the see of your parquet files to ensure them to be 1 GB minimum
C) Switch to TFRecords format (appr 200 MB per We) instead of parquet files
D) Switch from HDDs to SSDs. copy initial data from Cloud Storage to Hadoop Distributed File System (HDFS) run the Spark job and copy results back to Cloud Storage

Google Professional Data Engineer Exam - Topic 2 Question 62 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 62
Topic #: 2
[All Professional Data Engineer Questions]

You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Junita
7 months ago
Preemptible VMs are great for saving costs, but are they reliable enough?
upvoted 0 times
...
Fausto
7 months ago
Not sure if switching formats to TFRecords is worth it...
upvoted 0 times
...
Edgar
7 months ago
Increasing parquet file size to 1 GB could help with performance.
upvoted 0 times
...
Yuonne
8 months ago
Definitely agree, switching to SSDs is a smart move.
upvoted 0 times
...
Mohammad
8 months ago
I heard SSDs can really speed things up!
upvoted 0 times
...
Josephine
8 months ago
I feel like copying data to HDFS might add complexity, but it could also speed things up. Not sure if it's worth it with the cost constraints.
upvoted 0 times
...
Vincenza
8 months ago
I practiced a similar question where file format made a difference, but I can't recall if TFRecords would actually be better than parquet in this case.
upvoted 0 times
...
Kris
8 months ago
I think switching to SSDs could improve performance, but I’m not clear on how that interacts with preemptible VMs.
upvoted 0 times
...
Markus
8 months ago
I remember reading that increasing the size of parquet files can help with performance, but I'm not sure if 1 GB is the right target.
upvoted 0 times
...
Latia
8 months ago
Wait, I'm a bit confused. How do I use the information about the total number of students and the number of French students to find the probability that a French student is female? I need to review the conditional probability formula.
upvoted 0 times
...
Eileen
8 months ago
Okay, let me break this down. The heat map is used to visualize and prioritize risks, so I think the answer is B - control monitoring. That would allow the organization to focus on the highest-risk areas.
upvoted 0 times
...
Ling
8 months ago
I feel pretty good about the filtering and inquiry capabilities, but the other details are a bit fuzzy. I'll make sure to double-check those in the exam.
upvoted 0 times
...
Janessa
8 months ago
This looks like a tricky one. I'll need to carefully read through the options and think about the best way to capture the PII data securely while preventing it from leaking to Stackdriver.
upvoted 0 times
...

Save Cancel