Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 4 Question 44 Discussion

You are designing a cloud-native historical data processing system to meet the following conditions:The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.A streaming data pipeline stores new data daily.Peformance is not a factor in the solution.The solution design should maximize availability.How should you design data storage for this solution?
D) Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.
A) Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.
B) Store the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.
C) Store the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.

Google Professional Data Engineer Exam - Topic 4 Question 44 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 44
Topic #: 4
[All Professional Data Engineer Questions]

You are designing a cloud-native historical data processing system to meet the following conditions:

The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.

A streaming data pipeline stores new data daily.

Peformance is not a factor in the solution.

The solution design should maximize availability.

How should you design data storage for this solution?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Johnetta
7 months ago
Multi-regional storage? That’s a bit overkill, isn’t it?
upvoted 0 times
...
Una
7 months ago
Wait, why not just use HDFS? Seems outdated.
upvoted 0 times
...
Dion
8 months ago
Storing in BigQuery (Option B) could be more efficient!
upvoted 0 times
...
Tashia
8 months ago
I think D is better for redundancy.
upvoted 0 times
...
Nguyet
8 months ago
Option C seems solid for availability.
upvoted 0 times
...
Dominga
8 months ago
I feel like a multi-regional Cloud Storage bucket could be overkill for this scenario, but it does sound like it would ensure high availability.
upvoted 0 times
...
Kandis
8 months ago
I practiced a similar question where we had to maximize availability, and I think using a regional Cloud Storage bucket might be a good balance, but I’m not completely confident.
upvoted 0 times
...
Carmelina
8 months ago
I think storing the data in BigQuery could simplify access for analysis tools, but I’m a bit uncertain about how it handles CSV and PDF formats directly.
upvoted 0 times
...
Reuben
8 months ago
I remember we discussed the importance of using Cloud Storage for different data formats, but I'm not sure if multi-regional is necessary since performance isn't a factor.
upvoted 0 times
...
Carlton
8 months ago
I'm pretty sure the answer is D. Discussing alternative plans and gauging reactions doesn't seem like it would be part of the needs analysis stage.
upvoted 0 times
...
Janine
8 months ago
I think VRRP advertisements are sent only from the master router, but I can't remember if the standby routers send them too.
upvoted 0 times
...
Evan
8 months ago
Okay, let me see. The question is asking about determining the breakeven point, so I think the key is finding the technique that looks at the costs and projected income over time. Cost-benefit analysis seems relevant, but discounted cash flow feels like the more precise answer here.
upvoted 0 times
...
Raul
8 months ago
I think asynchronous collaboration means working at different times, so online meetings probably aren't it. I'm leaning towards wikis and shared workspaces.
upvoted 0 times
...

Save Cancel