Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 4 Question 73 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 73
Topic #: 4
[All Professional Data Engineer Questions]

An aerospace company uses a proprietary data format to store its night dat

a. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Huey
5 months ago
Shell scripts for ETL? That feels outdated, don't you think?
upvoted 0 times
...
Nichelle
5 months ago
I think A could work too, but it might not be as efficient.
upvoted 0 times
...
Johnson
5 months ago
Wait, can you really use Apache Beam for this? Seems complicated.
upvoted 0 times
...
Ceola
5 months ago
I agree, Avro format is great for this kind of task!
upvoted 0 times
...
Valda
5 months ago
Option D sounds like the best choice for streaming data efficiently.
upvoted 0 times
...
Vallie
6 months ago
I vaguely remember something about using Avro for efficient data storage in BigQuery, so option D sounds promising, but I need to double-check the details.
upvoted 0 times
...
Adrianna
6 months ago
I feel like the shell script option might be too slow for streaming, but I can't recall the specifics of why that might be a problem.
upvoted 0 times
...
Brynn
6 months ago
I think using Apache Beam could be a good option, especially if we need to handle custom data formats efficiently.
upvoted 0 times
...
Daron
6 months ago
I remember we discussed using Dataflow for streaming data, but I'm not sure if a standard pipeline is the best choice for proprietary formats.
upvoted 0 times
...
Galen
6 months ago
Ah, I see. The key here is to find the most efficient way to get the data into BigQuery. Option A with raw data storage and later transformation seems a bit inefficient. And I'm not too familiar with Hive, so I'll probably rule out option C. I think I'll go with option D and the Apache Beam custom connector - that sounds like the most resource-efficient solution.
upvoted 0 times
...
Marla
6 months ago
Hmm, I'm a bit unsure about this one. The question mentions wanting to use as few resources as possible, so I'm not sure if a full Dataflow pipeline is the best approach. Maybe option B with a Cloud Function would be more efficient? I'll have to think this through a bit more.
upvoted 0 times
...
Viva
6 months ago
This seems like a straightforward data ingestion problem, but the proprietary data format is a bit tricky. I think I'll go with option D and use an Apache Beam custom connector to stream the data into BigQuery in Avro format - that should be the most efficient approach.
upvoted 0 times
...
Kallie
6 months ago
Identifying risk tolerance seems like the most logical answer here. That's really the foundation for determining an organization's risk appetite, right? The other options are important, but this one feels like the core consideration.
upvoted 0 times
...
Ardella
6 months ago
This seems like a tricky question, but I think the key is to focus on defining the target business architecture first using a top-down approach. That will give us a clear vision to work towards.
upvoted 0 times
...
Reena
2 years ago
You know, I'm actually kind of curious about the Hive option (option C). Dataproc could give us a bit more flexibility in how we process the data, and using CSV format might be easier to work with than the proprietary format. But I agree, the Beam connector in option D seems like the most straightforward and efficient solution.
upvoted 0 times
...
Carry
2 years ago
Haha, I can just imagine the IT team trying to figure out how to connect that proprietary data format to BigQuery. It's like trying to fit a square peg in a round hole! I think option D is the way to go - the Avro format should be more compatible than CSV, and Dataflow can handle the streaming without too much overhead.
upvoted 0 times
Karon
2 years ago
Yeah, Avro format should definitely help with compatibility and Dataflow can handle the streaming seamlessly.
upvoted 0 times
...
Beatriz
2 years ago
D) Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
upvoted 0 times
...
Fabiola
2 years ago
Option D sounds like the most efficient way to handle the data streaming while minimizing resource consumption.
upvoted 0 times
...
Oren
2 years ago
D) Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
upvoted 0 times
...
Cora
2 years ago
Haha, I can just imagine the IT team trying to figure out how to connect that proprietary data format to BigQuery. It's like trying to fit a square peg in a round hole!
upvoted 0 times
...
Omer
2 years ago
D) Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
upvoted 0 times
...
...
Benedict
2 years ago
I'm not sure about that. Option B with the Cloud Function batch job sounds promising too. It might be a bit more manual, but if the data format is really complex, it could give us more control over the transformation process. Plus, running it as a batch job could be more efficient than a continuous stream.
upvoted 0 times
...
Beula
2 years ago
Hmm, this is a tricky one. We need to find the most efficient way to get that proprietary data into BigQuery without wasting resources. I'm leaning towards option D - using an Apache Beam custom connector to set up a Dataflow pipeline that streams the data directly into BigQuery in Avro format. That way, we can bypass the raw data storage and transformation steps, which could be resource-intensive.
upvoted 0 times
...

Save Cancel