Google Exam Associate Data Practitioner Topic 3 Question 9 Discussion

Actual exam question for Google's Associate Data Practitioner exam

Question #: 9
Topic #: 3

[All Associate Data Practitioner Questions]

Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?

ACreate a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.

BLaunch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.

CCreate external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.

DUse the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.

Show Suggested Answer

Suggested Answer: C

Creating external tables over the Parquet files in Cloud Storage allows you to perform SQL-based analysis and joins with data already in BigQuery without needing to load the files into BigQuery. This approach is efficient for a one-time analysis as it avoids the time and cost associated with loading large volumes of data into BigQuery. External tables provide seamless integration with Cloud Storage, enabling quick and cost-effective analysis of data stored in Parquet format.

by Lavonne at Mar 19, 2025, 09:39 AM

Limited Time Offer

25%

Off

Get Premium Associate Data Practitioner Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Rory

1 months ago

Option B seems like the way to go. Cloud Data Fusion makes it easy to connect multiple data sources and perform complex analyses.

upvoted 0 times

Melissia

10 days ago

I agree, using plugins to connect to the data sources and performing SQL join operations seems efficient.

upvoted 0 times

...

Laurel

11 days ago

I think option B is the best choice. Cloud Data Fusion can easily connect to both BigQuery and Cloud Storage.

upvoted 0 times

...

Monroe

2 months ago

Ha! Looks like we've got some 'Big Data' on our hands. I'd go with option D - keep it simple, stupid!

upvoted 0 times

...

Theresia

2 months ago

PySpark is overkill for a one-time analysis. Option C looks like the most straightforward approach here.

upvoted 0 times

Suzan

18 days ago

Yeah, using external tables to perform SQL joins seems like the simplest solution.

upvoted 0 times

...

Maybelle

1 months ago

I agree, Option C seems like the most efficient choice for this scenario.

upvoted 0 times

...

Ardella

2 months ago

I'm not a fan of external tables - they can be a bit of a pain to manage. I'd go with option D and just load the Parquet files directly into BigQuery.

upvoted 0 times

...