Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 3 Question 91 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 91
Topic #: 3
[All Professional Data Engineer Questions]

Your company's customer_order table in BigOuery stores the order history for 10 million customers, with a table size of 10 PB. You need to create a dashboard for the support team to view the order history. The dashboard has two filters, countryname and username. Both are string data types in the BigQuery table. When a filter is applied, the dashboard fetches the order history from the table and displays the query results. However, the dashboard is slow to show the results when applying the filters to the following query:

How should you redesign the BigQuery table to support faster access?

Show Suggested Answer Hide Answer
Suggested Answer: C

To improve the performance of querying a large BigQuery table with filters on countryname and username, clustering the table by these fields is the most effective approach. Here's why option C is the best choice:

Clustering in BigQuery:

Clustering organizes data based on the values in specified columns. This can significantly improve query performance by reducing the amount of data scanned during query execution.

Clustering by countryname and username means that data is physically sorted and stored together based on these fields, allowing BigQuery to quickly locate and read only the relevant data for queries using these filters.

Filter Efficiency:

With the table clustered by countryname and username, queries that filter on these columns can benefit from efficient data retrieval, reducing the amount of data processed and speeding up query execution.

This directly addresses the performance issue of the dashboard queries that apply filters on these fields.

Steps to Implement:

Redesign the Table:

Create a new table with clustering on countryname and username:

CREATE TABLE project.dataset.new_table

CLUSTER BY countryname, username AS

SELECT * FROM project.dataset.customer_order;

Migrate Data:

Transfer the existing data from the original table to the new clustered table.

Update Queries:

Modify the dashboard queries to reference the new clustered table.


BigQuery Clustering Documentation

Optimizing Query Performance

Contribute your Thoughts:

0/2000 characters
Graham
6 days ago
I think partitioning by both fields makes more sense.
upvoted 0 times
...
Rima
12 days ago
Clustering by country and username could speed things up!
upvoted 0 times
...
Aron
18 days ago
I feel like partitioning by _PARTITIONTIME is more about time-based queries, so it might not be the best choice here.
upvoted 0 times
...
Darrel
23 days ago
I practiced a similar question where clustering was emphasized for improving filter performance, so I’m leaning towards that option.
upvoted 0 times
...
Ahmad
28 days ago
I'm not entirely sure, but I think partitioning by both country and username might be overkill.
upvoted 0 times
...
Lauran
1 month ago
I remember discussing how clustering can help with query performance, especially with string fields like country and username.
upvoted 0 times
...
Lemuel
1 month ago
Clustering the table by country and username seems like it could work well. That way the data will be physically organized in a way that supports the dashboard's filtering needs.
upvoted 0 times
...
Ciara
1 month ago
I'm a bit confused on the difference between partitioning and clustering. I'll need to review those concepts before deciding on the best solution.
upvoted 0 times
...
Gail
1 month ago
Hmm, partitioning by country and username sounds like a good approach to me. That should help speed up the filtering process.
upvoted 0 times
...
Moon
1 month ago
This is a tricky one. I'll need to think carefully about the best way to optimize the table structure for faster filtering.
upvoted 0 times
...
King
1 month ago
Partitioning by _PARTITIONTIME might be an interesting approach, but I'm not sure if that will help with the specific filters the dashboard is using. I'll need to think this through carefully.
upvoted 0 times
...
Gracia
1 month ago
Hmm, I'm a bit unsure about this one. Do I need to focus solely on sales and marketing, or should I also consider how to resolve the original problem? I'm leaning towards option B, but I'll have to think it through carefully.
upvoted 0 times
...
William
2 months ago
I'm a bit confused on the difference between GASB and FASB. I know they both set accounting standards, but I'm not sure if GASB is responsible for developing their own standards or just making sure FASB standards are used by state and local governments. I'll have to guess on this one.
upvoted 0 times
...
Major
1 year ago
Partitioning by country and username is the obvious choice here. Anything else would just be a waste of time. 'Obvious' as a feather on a fish, am I right?
upvoted 0 times
Edelmira
1 year ago
C) Cluster the table by country and username fields
upvoted 0 times
...
Goldie
1 year ago
Partitioning by country and username is the best approach for faster access.
upvoted 0 times
...
Tuyet
1 year ago
A) Cluster the table by country field, and partition by username field.
upvoted 0 times
...
...
Johnna
1 year ago
I think partitioning by _PARTITIONTIME would also be a good option to consider for faster access.
upvoted 0 times
...
Beckie
1 year ago
But clustering by country field would reduce the data scanned when filtering by country, making it faster.
upvoted 0 times
...
Kaitlyn
1 year ago
I disagree, I believe partitioning the table by country and username fields would be more efficient.
upvoted 0 times
...
Beckie
1 year ago
I think we should cluster the table by country field and partition by username field.
upvoted 0 times
...
Alex
1 year ago
Clustering the table by country and username sounds like a good idea, but partitioning is probably more efficient. I'd go with option B.
upvoted 0 times
Camellia
1 year ago
Yeah, that way the dashboard can fetch the data faster when applying the filters.
upvoted 0 times
...
Simona
1 year ago
I agree, partitioning the table by country and username fields seems like the best option.
upvoted 0 times
...
...
Ernestine
1 year ago
Partitioning by _PARTITIONTIME might work, but it doesn't address the specific filters we need. Gotta go with option B.
upvoted 0 times
...
Ashanti
1 year ago
Hmm, I think partitioning by both country and username fields is the way to go. That should give us faster lookups on those filters.
upvoted 0 times
Berry
1 year ago
Hmm, I think partitioning by both country and username fields is the way to go. That should give us faster lookups on those filters.
upvoted 0 times
...
Nichelle
1 year ago
B) Partition the table by country and username fields.
upvoted 0 times
...
Christiane
1 year ago
A) Cluster the table by country field, and partition by username field.
upvoted 0 times
...
...

Save Cancel