New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 1 Question 111 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 111
Topic #: 1
[All Professional Data Engineer Questions]

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.

You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

Show Suggested Answer Hide Answer
Suggested Answer: C, E

Contribute your Thoughts:

0/2000 characters
Gene
2 months ago
TAR files could work, but is it worth the extra steps?
upvoted 0 times
...
Rory
2 months ago
Wait, can we really compress those files enough to make a difference?
upvoted 0 times
...
Hannah
2 months ago
I think increasing bandwidth is a must.
upvoted 0 times
...
Marylyn
3 months ago
Doubling files means we need a better solution fast!
upvoted 0 times
...
Glennis
3 months ago
gsutil for parallel uploads sounds like a solid plan!
upvoted 0 times
...
Denny
3 months ago
I’m a bit hesitant about the TAR file approach. It could save bandwidth, but what if the disassembly process in the cloud introduces delays?
upvoted 0 times
...
Fatima
4 months ago
I think using gsutil for parallel uploads could really help with the ingestion process. We practiced a similar question about optimizing data transfers last week.
upvoted 0 times
...
Gail
4 months ago
Increasing the bandwidth sounds like a straightforward solution, but I wonder if it would actually solve the latency issue we talked about.
upvoted 0 times
...
Rosalind
4 months ago
I remember we discussed data compression in class, and it seems like a good way to speed up transfers. But I'm not sure if it would be enough on its own.
upvoted 0 times
...
Freeman
4 months ago
This is an interesting challenge! I like the idea of using the Google Cloud Storage Transfer Service to move the data from on-premises to the cloud. That could be a really efficient way to handle the increased volume without having to mess with the local infrastructure.
upvoted 0 times
...
German
4 months ago
I'm a bit confused by all the options here. I think I'd need to do some more research on the different tools and techniques before deciding on the best approach. Maybe I'll start by looking into the pros and cons of each solution.
upvoted 0 times
...
Nadine
5 months ago
Okay, let's think this through step-by-step. First, I'd look at the compression option to reduce the file size. Then, I'd explore the parallel processing idea using the gsutil tool. That should help us meet the tight deadline.
upvoted 0 times
...
Jeannine
5 months ago
Hmm, this is a tricky one. I'm not sure if increasing the bandwidth alone will be enough, since the latency is also a factor. Maybe I should look into some cloud-based solutions to speed up the data transfer.
upvoted 0 times
...
Alida
5 months ago
This looks like a classic data ingestion problem. I'd start by looking at the bandwidth and latency constraints, and see if I can improve the transfer speed through compression or parallel processing.
upvoted 0 times
...
Mitsue
5 months ago
D is just weird. Taping files? What is this, the 90s? C and E all the way.
upvoted 0 times
...
Avery
6 months ago
A and B might work in the short term, but they won't be sustainable long-term. Gotta rethink the whole process.
upvoted 0 times
Eladia
2 months ago
Let's not forget about the TAR option too!
upvoted 0 times
...
Nieves
2 months ago
gsutil could really speed things up.
upvoted 0 times
...
Malinda
3 months ago
A and B might work for now, but we need a long-term solution.
upvoted 0 times
...
Kristofer
3 months ago
Agreed! Redesigning the ingestion process is key.
upvoted 0 times
...
...
Yan
6 months ago
That's true, but I think prioritizing parallel transfer and using a dedicated storage endpoint might be more efficient in the long run.
upvoted 0 times
...
Dorethea
6 months ago
But what about option A? Introducing data compression could also help with faster file transfer.
upvoted 0 times
...
Arthur
7 months ago
I agree, using gsutil tool and Google Cloud Storage Transfer Service can help with the increased volume of files.
upvoted 0 times
...
Yan
7 months ago
I think we should consider option C and E.
upvoted 0 times
...
Carissa
7 months ago
C and E seem like the best options here. Parallelizing the data transfer and using a storage service could really help handle the increased load.
upvoted 0 times
Stefany
5 months ago
E) Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
upvoted 0 times
...
Tomas
5 months ago
E) Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
upvoted 0 times
...
Paris
6 months ago
C) Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
upvoted 0 times
...
Brianne
6 months ago
C) Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
upvoted 0 times
...
...

Save Cancel