Google Professional Data Engineer Exam - Topic 2 Question 62 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 62
Topic #: 2

[All Professional Data Engineer Questions]

You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?

ASwitch from HODs to SSDs override the preemptible VMs configuration to increase the boot disk size

BIncrease the see of your parquet files to ensure them to be 1 GB minimum

CSwitch to TFRecords format (appr 200 MB per We) instead of parquet files

DSwitch from HDDs to SSDs. copy initial data from Cloud Storage to Hadoop Distributed File System (HDFS) run the Spark job and copy results back to Cloud Storage

Show Suggested Answer

Suggested Answer: A

by Sylvia at Dec 12, 2022, 04:31 AM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Junita

5 months ago

Preemptible VMs are great for saving costs, but are they reliable enough?

upvoted 0 times

...

Fausto

5 months ago

Not sure if switching formats to TFRecords is worth it...

upvoted 0 times

...

Edgar

6 months ago

Increasing parquet file size to 1 GB could help with performance.

upvoted 0 times

...

Yuonne

6 months ago

Definitely agree, switching to SSDs is a smart move.

upvoted 0 times

...

Mohammad

6 months ago

I heard SSDs can really speed things up!

upvoted 0 times

...

Josephine

6 months ago

I feel like copying data to HDFS might add complexity, but it could also speed things up. Not sure if it's worth it with the cost constraints.

upvoted 0 times

...

Vincenza

6 months ago

I practiced a similar question where file format made a difference, but I can't recall if TFRecords would actually be better than parquet in this case.

upvoted 0 times

...

Kris

6 months ago

I think switching to SSDs could improve performance, but I’m not clear on how that interacts with preemptible VMs.

upvoted 0 times

...

Markus

6 months ago

I remember reading that increasing the size of parquet files can help with performance, but I'm not sure if 1 GB is the right target.

upvoted 0 times

...

Latia

6 months ago

Wait, I'm a bit confused. How do I use the information about the total number of students and the number of French students to find the probability that a French student is female? I need to review the conditional probability formula.

upvoted 0 times

...

Eileen

6 months ago

Okay, let me break this down. The heat map is used to visualize and prioritize risks, so I think the answer is B - control monitoring. That would allow the organization to focus on the highest-risk areas.

upvoted 0 times

...

Ling

6 months ago

I feel pretty good about the filtering and inquiry capabilities, but the other details are a bit fuzzy. I'll make sure to double-check those in the exam.

upvoted 0 times

...

Janessa

7 months ago

This looks like a tricky one. I'll need to carefully read through the options and think about the best way to capture the PII data securely while preventing it from leaking to Stackdriver.

upvoted 0 times

...