New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 3 Question 45 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 45
Topic #: 3
[All Professional Data Engineer Questions]

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this dat

a. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

Show Suggested Answer Hide Answer
Suggested Answer: A, E

Contribute your Thoughts:

0/2000 characters
Chauncey
4 months ago
Daily snapshots to Cloud Storage? Seems a bit excessive.
upvoted 0 times
...
Ty
4 months ago
Appending updates instead of overwriting is smart!
upvoted 0 times
...
Justine
4 months ago
Wait, can BigQuery handle that much data efficiently?
upvoted 0 times
...
Stanford
4 months ago
I think preserving structure is key for analysis.
upvoted 0 times
...
Francesco
5 months ago
Denormalizing sounds like a good idea for performance!
upvoted 0 times
...
Phuong
5 months ago
I feel like appending status updates instead of updating might make querying easier, but I can't quite remember the trade-offs.
upvoted 0 times
...
Linwood
5 months ago
I think preserving the structure is important for analysis, but I also recall a practice question where denormalization helped with performance.
upvoted 0 times
...
Dyan
5 months ago
I remember we discussed denormalization in class, but I'm not sure if it's the best approach for time-series data.
upvoted 0 times
...
Nydia
5 months ago
The idea of using external data sources like Avro files sounds familiar, but I'm not clear on how that impacts performance in BigQuery.
upvoted 0 times
...
Emelda
5 months ago
The Google-recommended best practices are important here. I think option C is the way to go - converting to a StatefulSet and using a PodDisruptionBudget of 80%. That should give me the control and reliability I need for this deployment.
upvoted 0 times
...
Breana
5 months ago
Ugh, I'm drawing a blank on this one. Biometrics aren't my strongest area, and I'm not sure I fully understand the limitations of retina scanning. I'll have to guess and hope for the best.
upvoted 0 times
...
Chu
5 months ago
I'm pretty sure this is about interns and medical students. Makes sense that the attending needs to verify their documentation.
upvoted 0 times
...
Trinidad
5 months ago
Okay, I think I have a good handle on this. I'd recommend going with the Kubernetes Engine option to minimize operational overhead and handle the unpredictable workload.
upvoted 0 times
...

Save Cancel