Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Associate Data Practitioner Exam - Topic 3 Question 9 Discussion

Actual exam question for Google's Associate Data Practitioner exam
Question #: 9
Topic #: 3
[All Associate Data Practitioner Questions]

Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: C

Creating external tables over the Parquet files in Cloud Storage allows you to perform SQL-based analysis and joins with data already in BigQuery without needing to load the files into BigQuery. This approach is efficient for a one-time analysis as it avoids the time and cost associated with loading large volumes of data into BigQuery. External tables provide seamless integration with Cloud Storage, enabling quick and cost-effective analysis of data stored in Parquet format.


Contribute your Thoughts:

0/2000 characters
Tiara
4 months ago
Wait, can you really do SQL joins with external tables? Sounds too good to be true!
upvoted 0 times
...
Nathan
4 months ago
I think B could work too, but it might be overkill for a one-time analysis.
upvoted 0 times
...
Terrilyn
4 months ago
A seems like too much setup for a quick job.
upvoted 0 times
...
Gerald
4 months ago
D is not ideal, loading a petabyte into BigQuery sounds like a nightmare.
upvoted 0 times
...
Shawnda
5 months ago
C is the easiest way to do this! Just create external tables.
upvoted 0 times
...
Fletcher
5 months ago
I recall that Cloud Data Fusion is useful for ETL processes, but for a quick SQL analysis, I wonder if option C is still the best approach.
upvoted 0 times
...
Alfred
5 months ago
I practiced a question similar to this, and I feel like using the bq load command in option D might be overkill for a one-time analysis.
upvoted 0 times
...
Cherelle
5 months ago
I'm not entirely sure, but I remember something about using Dataproc for big data processing. Maybe option A is the right choice?
upvoted 0 times
...
Janella
6 months ago
I think option C sounds familiar; creating external tables over the Parquet files seems like a straightforward way to analyze them without moving data.
upvoted 0 times
...
Garry
6 months ago
For a petabyte-scale dataset, I'd be hesitant to use the bq load command (option D). That could take a really long time to ingest all the data. I'm inclined to go with option C and create the external tables in BigQuery. Seems like the most efficient way to get the job done.
upvoted 0 times
...
Matilda
6 months ago
Option B with Cloud Data Fusion looks interesting, but I'm not as familiar with that service. I think I'll stick with the more traditional BigQuery approach and go with option C. The external table integration should be pretty straightforward.
upvoted 0 times
...
Glenn
6 months ago
I'm a bit unsure about this one. The question mentions a "one-time" analysis, so I'm not sure if creating a Dataproc cluster for a PySpark job is the best use of resources. I'm leaning towards option C or D, but I'll need to think it through a bit more.
upvoted 0 times
...
Adell
6 months ago
This seems like a straightforward data integration task, so I'll likely go with option C. Creating external tables in BigQuery to join the Parquet files from Cloud Storage seems like the most efficient approach.
upvoted 0 times
...
Rory
12 months ago
Option B seems like the way to go. Cloud Data Fusion makes it easy to connect multiple data sources and perform complex analyses.
upvoted 0 times
Bernardo
10 months ago
Let's go with option B then. It seems like the most efficient solution for this scenario.
upvoted 0 times
...
Shannan
10 months ago
It definitely sounds like the most straightforward way to quickly analyze the data.
upvoted 0 times
...
Melissia
11 months ago
I agree, using plugins to connect to the data sources and performing SQL join operations seems efficient.
upvoted 0 times
...
Laurel
11 months ago
I think option B is the best choice. Cloud Data Fusion can easily connect to both BigQuery and Cloud Storage.
upvoted 0 times
...
...
Monroe
12 months ago
Ha! Looks like we've got some 'Big Data' on our hands. I'd go with option D - keep it simple, stupid!
upvoted 0 times
...
Theresia
12 months ago
PySpark is overkill for a one-time analysis. Option C looks like the most straightforward approach here.
upvoted 0 times
Suzan
11 months ago
Yeah, using external tables to perform SQL joins seems like the simplest solution.
upvoted 0 times
...
Maybelle
11 months ago
I agree, Option C seems like the most efficient choice for this scenario.
upvoted 0 times
...
...
Ardella
12 months ago
I'm not a fan of external tables - they can be a bit of a pain to manage. I'd go with option D and just load the Parquet files directly into BigQuery.
upvoted 0 times
...
Irma
1 year ago
I think option D could work too, loading the files into BigQuery.
upvoted 0 times
...
Filiberto
1 year ago
I prefer option C, creating external tables over the files in Cloud Storage.
upvoted 0 times
...
Fairy
1 year ago
I agree, using Dataproc cluster with PySpark seems efficient.
upvoted 0 times
...
Daniel
1 year ago
Cloud Data Fusion seems like the easiest way to get this done. No need to write any code!
upvoted 0 times
Joni
11 months ago
C) Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.
upvoted 0 times
...
Peggy
11 months ago
B) Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.
upvoted 0 times
...
Sarah
12 months ago
A) Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.
upvoted 0 times
...
...
Sommer
1 year ago
I think option A sounds like a good idea.
upvoted 0 times
...

Save Cancel