Google Exam Professional-Data-Engineer Topic 3 Question 80 Discussion

Actual exam question for Google's Google Cloud Certified Professional Data Engineer exam

Question #: 80
Topic #: 3

[All Google Cloud Certified Professional Data Engineer Questions]

You are loading CSV files from Cloud Storage to BigQuery. The files have known data quality issues, including mismatched data types, such as STRINGS and INT64s in the same column, and inconsistent formatting of values such as phone numbers or addresses. You need to create the data pipeline to maintain data quality and perform the required cleansing and transformation. What should you do?

AUse Data Fusion to transform the data before loading it into BigQuery.

BLoad the CSV files into a staging table with the desired schema, perform the transformations with SQL. and then write the results to the final destination table.

CCreate a table with the desired schema, toad the CSV files into the table, and perform the transformations in place using SQL.

DUse Data Fusion to convert the CSV files lo a self-describing data formal, such as AVRO. before loading the data to BigOuery.

Show Suggested Answer

Suggested Answer: A

Data Fusion's advantages:

Visual interface: Offers a user-friendly interface for designing data pipelines without extensive coding, making it accessible to a wider range of users.

Built-in transformations: Includes a wide range of pre-built transformations to handle common data quality issues, such as:

Data type conversions

Data cleansing (e.g., removing invalid characters, correcting formatting)

Data validation (e.g., checking for missing values, enforcing constraints)

Data enrichment (e.g., adding derived fields, joining with other datasets)

Custom transformations: Allows for custom transformations using SQL or Java code for more complex cleaning tasks.

Scalability: Can handle large datasets efficiently, making it suitable for processing CSV files with potential data quality issues.

Integration with BigQuery: Integrates seamlessly with BigQuery, allowing for direct loading of transformed data.

by Mollie at Apr 08, 2024, 02:05 PM

Limited Time Offer

25%

Off

Get Premium Professional-Data-Engineer Questions as Interactive Web-Based Practice Test or PDF

Comments

Submit Cancel

Emogene

15 hours ago

Option B sounds like the way to go. Staging the data first and then transforming it with SQL gives you more control and flexibility. Plus, you can easily track the changes and audit the process.

upvoted 0 times

...

Cherry

2 days ago

Ugh, this question is a real doozy! I've dealt with data quality issues before, and it's definitely not a walk in the park. I'm leaning towards option B - it seems like the most comprehensive approach to handling the data cleansing and transformation.

upvoted 0 times

...