Google Exam Professional Data Engineer Topic 5 Question 103 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 103
Topic #: 5

[All Professional Data Engineer Questions]

You store and analyze your relational data in BigQuery on Google Cloud with all data that resides in US regions. You also have a variety of object stores across Microsoft Azure and Amazon Web Services (AWS), also in US regions. You want to query all your data in BigQuery daily with as little movement of data as possible. What should you do?

ALoad files from AWS and Azure to Cloud Storage with Cloud Shell gautil rsync arguments.

BCreate a Dataflow pipeline to ingest files from Azure and AWS to BigQuery.

CUse the BigQuery Omni functionality and BigLake tables to query files in Azure and AWS.

DUse BigQuery Data Transfer Service to load files from Azure and AWS into BigQuery.

Show Suggested Answer

Suggested Answer: C

To improve the performance of querying a large BigQuery table with filters on countryname and username, clustering the table by these fields is the most effective approach. Here's why option C is the best choice:

Clustering in BigQuery:

Clustering organizes data based on the values in specified columns. This can significantly improve query performance by reducing the amount of data scanned during query execution.

Clustering by countryname and username means that data is physically sorted and stored together based on these fields, allowing BigQuery to quickly locate and read only the relevant data for queries using these filters.

Filter Efficiency:

With the table clustered by countryname and username, queries that filter on these columns can benefit from efficient data retrieval, reducing the amount of data processed and speeding up query execution.

This directly addresses the performance issue of the dashboard queries that apply filters on these fields.

Steps to Implement:

Redesign the Table:

Create a new table with clustering on countryname and username:

CREATE TABLE project.dataset.new_table

CLUSTER BY countryname, username AS

SELECT * FROM project.dataset.customer_order;

Migrate Data:

Transfer the existing data from the original table to the new clustered table.

Update Queries:

Modify the dashboard queries to reference the new clustered table.

BigQuery Clustering Documentation

Optimizing Query Performance

by Izetta at Feb 08, 2025, 08:58 PM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

2 months ago

Option C seems like the way to go. Integrating BigQuery Omni and BigLake tables allows me to query all my data from different cloud providers without having to move it around. Efficiency FTW!

upvoted 0 times

Thersa

22 days ago

Definitely! It's all about optimizing data access and analysis while keeping data transfer to a minimum.

upvoted 0 times

...

Sherron

23 days ago

I agree, it's great to have a solution that minimizes data movement and allows for efficient querying across different cloud platforms.

upvoted 0 times

...

Cyril

25 days ago

That's a smart choice! BigQuery Omni is designed for exactly that purpose, to query data from multiple cloud providers seamlessly.

upvoted 0 times

...

Kallie

27 days ago

Option C seems like the way to go. Integrating BigQuery Omni and BigLake tables allows me to query all my data from different cloud providers without having to move it around. Efficiency FTW!

upvoted 0 times

...