New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 2 Question 22 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 22
Topic #: 2
[All Professional Data Engineer Questions]

You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:

Decoupling producer from consumer

Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely

Near real-time SQL query

Maintain at least 2 years of historical data, which will be queried with SQ

Which pipeline should you use to meet these requirements?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

0/2000 characters
Cruz
4 months ago
A is too manual for such a large volume of data.
upvoted 0 times
...
Shayne
4 months ago
Totally agree with D, it checks all the boxes!
upvoted 0 times
...
Filiberto
4 months ago
Wait, why would you use HDFS? Isn't that outdated?
upvoted 0 times
...
Delisa
4 months ago
I think B could work too, but it's not as scalable.
upvoted 0 times
...
Lavonna
5 months ago
Option D seems like the best fit for real-time processing!
upvoted 0 times
...
Germaine
5 months ago
I feel like I might be mixing up the roles of Cloud Pub/Sub and Dataflow. I hope I remember the details correctly during the exam!
upvoted 0 times
...
Casie
5 months ago
I practiced a similar question where we had to choose between Cloud Storage and BigQuery for storage. I think option D could be the right choice here.
upvoted 0 times
...
Lenna
5 months ago
I'm not entirely sure, but I think using Cloud SQL might not be the best for scalability with that much data.
upvoted 0 times
...
Gail
5 months ago
I remember we discussed the importance of decoupling producers and consumers, which makes me lean towards options that use Pub/Sub.
upvoted 0 times
...
Billy
5 months ago
Option A seems like the easiest approach - just merge the intermediate cert with the web server cert. But I'm not 100% sure if that's the recommended way to do it. I'll have to double-check the Apache documentation to be sure.
upvoted 0 times
...
Patria
5 months ago
I'm a little confused by all the abbreviations here. Let me think this through step-by-step - government audits are conducted to certain standards, INTOSAI sets standards for government organizations, and IFAC provides additional guidance. I'll have to double-check the answer choices to make sure I have it right.
upvoted 0 times
...
Mitsue
5 months ago
I've got this one! The answer is definitely Hazard. The question is describing a potential hazardous situation that could arise, so Hazard is the most accurate choice here.
upvoted 0 times
...
Sharen
5 months ago
I think the balanced scorecard focuses on four main areas, but I can't recall all of them. I remember "customer" is definitely one, but I'm unsure about the rest.
upvoted 0 times
...
Staci
5 months ago
I'm a bit stumped on this one. All of these options sound like potential ways the government could monitor its citizens. I'll have to make an educated guess and hope for the best. Maybe I can eliminate one or two and then choose the best remaining option.
upvoted 0 times
...

Save Cancel