Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Exam Professional Data Engineer Topic 5 Question 103 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 103
Topic #: 5
[All Professional Data Engineer Questions]

You store and analyze your relational data in BigQuery on Google Cloud with all data that resides in US regions. You also have a variety of object stores across Microsoft Azure and Amazon Web Services (AWS), also in US regions. You want to query all your data in BigQuery daily with as little movement of data as possible. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: C

To improve the performance of querying a large BigQuery table with filters on countryname and username, clustering the table by these fields is the most effective approach. Here's why option C is the best choice:

Clustering in BigQuery:

Clustering organizes data based on the values in specified columns. This can significantly improve query performance by reducing the amount of data scanned during query execution.

Clustering by countryname and username means that data is physically sorted and stored together based on these fields, allowing BigQuery to quickly locate and read only the relevant data for queries using these filters.

Filter Efficiency:

With the table clustered by countryname and username, queries that filter on these columns can benefit from efficient data retrieval, reducing the amount of data processed and speeding up query execution.

This directly addresses the performance issue of the dashboard queries that apply filters on these fields.

Steps to Implement:

Redesign the Table:

Create a new table with clustering on countryname and username:

CREATE TABLE project.dataset.new_table

CLUSTER BY countryname, username AS

SELECT * FROM project.dataset.customer_order;

Migrate Data:

Transfer the existing data from the original table to the new clustered table.

Update Queries:

Modify the dashboard queries to reference the new clustered table.


BigQuery Clustering Documentation

Optimizing Query Performance

Contribute your Thoughts:

Melynda
5 days ago
I'm not sure, but option D seems like a good option too. Using BigQuery Data Transfer Service can load files from Azure and AWS into BigQuery efficiently.
upvoted 0 times
...
Gilma
6 days ago
I disagree, I believe option C is the way to go. BigQuery Omni can query files in Azure and AWS without moving the data.
upvoted 0 times
...
Elroy
8 days ago
I think option B is the best choice because Dataflow can ingest files from Azure and AWS directly into BigQuery.
upvoted 0 times
...
Luis
9 days ago
Hmm, I'm not sure about using Cloud Shell and rsync to load files. Seems a bit manual and error-prone. I'd prefer an automated solution like Dataflow or Data Transfer Service.
upvoted 0 times
...
Nana
10 days ago
Option C seems like the way to go. Integrating BigQuery Omni and BigLake tables allows me to query all my data from different cloud providers without having to move it around. Efficiency FTW!
upvoted 0 times
...

Save Cancel