Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:Executing the transformations on a scheduleEnabling non-developer analysts to modify transformationsProviding a graphical tool for designing transformationsWhat should you do?
D) Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery
A) Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
B) Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
C) Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes

Google Professional Data Engineer Exam - Topic 2 Question 14 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 14
Topic #: 2
[All Professional Data Engineer Questions]

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:

Executing the transformations on a schedule

Enabling non-developer analysts to modify transformations

Providing a graphical tool for designing transformations

What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

Contribute your Thoughts:

0/2000 characters
Val
7 months ago
Wow, I didn't know schemas could change that often!
upvoted 0 times
...
Alayna
7 months ago
B seems more flexible for complex queries, though.
upvoted 0 times
...
Claribel
8 months ago
Not sure about that, isn't it limited in some ways?
upvoted 0 times
...
Amalia
8 months ago
Totally agree, Cloud Dataprep is super user-friendly!
upvoted 0 times
...
Roselle
8 months ago
I think option A is the best choice for non-developers.
upvoted 0 times
...
Stephaine
8 months ago
Apache Spark sounds powerful, but I’m uncertain if the graphical tools are as accessible for analysts compared to Cloud Dataprep.
upvoted 0 times
...
Sharita
8 months ago
I practiced writing Cloud Dataflow pipelines in Python, but I feel like that might be too technical for the analysts to modify easily.
upvoted 0 times
...
Shawnta
8 months ago
I remember we discussed how Cloud Dataprep is user-friendly for non-developers, but I'm not sure if it can handle schema changes every third month effectively.
upvoted 0 times
...
Jarvis
8 months ago
I think using BigQuery with SQL queries could be a solid approach, but it might get complicated with the schema changes.
upvoted 0 times
...
Phuong
8 months ago
This question seems straightforward, I'll carefully read through the options and select the two correct answers.
upvoted 0 times
...
Ivette
8 months ago
Okay, let me think this through step-by-step. I need to create a new file system with the same properties as the one shown, so I'll need to pay attention to the mountpoint and compression settings.
upvoted 0 times
...

Save Cancel