Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion

Question

Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:Executing the transformations on a scheduleEnabling non-developer analysts to modify transformationsProviding a graphical tool for designing transformationsWhat should you do?

A) Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis

B) Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query

C) Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes

Accepted Answer

D) Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery

Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion

Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion

Contribute your Thoughts:

Val

Alayna

Claribel

Amalia

Roselle

Stephaine

Sharita

Shawnta

Jarvis

Phuong

Ivette