Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion
You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:Executing the transformations on a scheduleEnabling non-developer analysts to modify transformationsProviding a graphical tool for designing transformationsWhat should you do?
D) Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery
A) Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
B) Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
C) Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes
Val
7 months agoAlayna
7 months agoClaribel
8 months agoAmalia
8 months agoRoselle
8 months agoStephaine
8 months agoSharita
8 months agoShawnta
8 months agoJarvis
8 months agoPhuong
8 months agoIvette
8 months ago