Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Professional Data Engineer Exam - Topic 3 Question 119 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 119
Topic #: 3
[All Professional Data Engineer Questions]

You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: B

Dataproc's Compatibility with Apache Spark:Dataproc is a managed service for running Hadoop and Spark clusters on Google Cloud. This means it is designed to seamlessly run Apache Spark jobs with minimal code changes. Your existing Spark jobs should run on Dataproc with little to no modification.

Cloud Storage as a Scalable Data Lake:Cloud Storage provides a highly scalable and durable storage solution for your data. It's designed to handle large volumes of data that Spark jobs typically process.

Minimizing Operational Overhead:By using Dataproc, you eliminate the need to manage and maintain a Hadoop cluster yourself. Google Cloud handles the infrastructure, allowing you to focus on your data processing tasks.

Tight Timeline and Minimal Code Changes:This option directly addresses the requirements of the question. It offers a quick and easy way to migrate your Spark jobs to Google Cloud with minimal disruption to your existing codebase.

Why other options are not suitable:

A . Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances:This option requires you to manage the underlying infrastructure yourself, which contradicts the requirement of using managed services.

C . Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach:While BigQuery is a powerful data warehouse, converting Spark scripts to SQL would require substantial code changes and might not be feasible within a tight timeline.

D . Rewrite your jobs in Apache Beam. Run your jobs in Dataflow:Rewriting jobs in Apache Beam would be a significant undertaking and not suitable for a quick migration with minimal code changes.


Contribute your Thoughts:

0/2000 characters
Carmela
7 hours ago
I remember we discussed how Dataproc is designed for running Spark jobs, so option B seems like a good fit. But I'm not entirely sure about the data transfer process.
upvoted 0 times
...

Save Cancel