Google Exam Professional Data Engineer Topic 4 Question 73 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 73
Topic #: 4

[All Professional Data Engineer Questions]

An aerospace company uses a proprietary data format to store its night dat

a. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

AUse a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.

BWrite a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

CUse Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

DUse an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

Show Suggested Answer

Suggested Answer: D

by Bette at Nov 23, 2023, 11:55 PM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Reena

1 years ago

You know, I'm actually kind of curious about the Hive option (option C). Dataproc could give us a bit more flexibility in how we process the data, and using CSV format might be easier to work with than the proprietary format. But I agree, the Beam connector in option D seems like the most straightforward and efficient solution.

upvoted 0 times

...

Carry

1 years ago

Haha, I can just imagine the IT team trying to figure out how to connect that proprietary data format to BigQuery. It's like trying to fit a square peg in a round hole! I think option D is the way to go - the Avro format should be more compatible than CSV, and Dataflow can handle the streaming without too much overhead.

upvoted 0 times

Karon

1 years ago

Yeah, Avro format should definitely help with compatibility and Dataflow can handle the streaming seamlessly.

upvoted 0 times

...

Beatriz

1 years ago

D) Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

upvoted 0 times

...

Fabiola

1 years ago

Option D sounds like the most efficient way to handle the data streaming while minimizing resource consumption.

upvoted 0 times

...

Oren

1 years ago

D) Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

upvoted 0 times

...

Cora

1 years ago

Haha, I can just imagine the IT team trying to figure out how to connect that proprietary data format to BigQuery. It's like trying to fit a square peg in a round hole!

upvoted 0 times

...

Omer

1 years ago

D) Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

upvoted 0 times

...

Benedict

1 years ago

I'm not sure about that. Option B with the Cloud Function batch job sounds promising too. It might be a bit more manual, but if the data format is really complex, it could give us more control over the transformation process. Plus, running it as a batch job could be more efficient than a continuous stream.

upvoted 0 times

...

Beula

1 years ago

Hmm, this is a tricky one. We need to find the most efficient way to get that proprietary data into BigQuery without wasting resources. I'm leaning towards option D - using an Apache Beam custom connector to set up a Dataflow pipeline that streams the data directly into BigQuery in Avro format. That way, we can bypass the raw data storage and transformation steps, which could be resource-intensive.

upvoted 0 times

...