You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once, and must be ordered within windows of 1 hour. How should you design the solution?
You work for a large real estate firm and are preparing 6 TB of home sales data lo be used for machine learning You will use SOL to transform the data and use BigQuery ML lo create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?
https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning
Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing on-premises applications every day. What should they do?
You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?
Dataproc's Compatibility with Apache Spark:Dataproc is a managed service for running Hadoop and Spark clusters on Google Cloud. This means it is designed to seamlessly run Apache Spark jobs with minimal code changes. Your existing Spark jobs should run on Dataproc with little to no modification.
Cloud Storage as a Scalable Data Lake:Cloud Storage provides a highly scalable and durable storage solution for your data. It's designed to handle large volumes of data that Spark jobs typically process.
Minimizing Operational Overhead:By using Dataproc, you eliminate the need to manage and maintain a Hadoop cluster yourself. Google Cloud handles the infrastructure, allowing you to focus on your data processing tasks.
Tight Timeline and Minimal Code Changes:This option directly addresses the requirements of the question. It offers a quick and easy way to migrate your Spark jobs to Google Cloud with minimal disruption to your existing codebase.
Why other options are not suitable:
A . Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances:This option requires you to manage the underlying infrastructure yourself, which contradicts the requirement of using managed services.
C . Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach:While BigQuery is a powerful data warehouse, converting Spark scripts to SQL would require substantial code changes and might not be feasible within a tight timeline.
D . Rewrite your jobs in Apache Beam. Run your jobs in Dataflow:Rewriting jobs in Apache Beam would be a significant undertaking and not suitable for a quick migration with minimal code changes.
Harold Gonzalez
9 days agoThomas Smith
16 days agoBrian Sanchez
20 days agoHarold Brown
1 month agoFrank Walker
2 months agoJoseph Johnson
1 month agoSteven Green
1 month agoKimberly Garcia
1 month agoGeorge Moore
29 days agoDonald Peterson
26 days agoCarlene
2 months agoHoa
2 months agoIdella
3 months agoIdella
3 months agoClemencia
3 months agoDiane
3 months agoMelvin
4 months agoGregoria
4 months agoDiane
4 months agoMerlyn
4 months agoSharen
5 months agoLeota
5 months agoTrinidad
5 months agoLacresha
5 months agoTimmy
6 months agoLashaun
6 months agoOllie
6 months agoEdison
6 months agoTawna
7 months agoCoral
7 months agoBrendan
7 months agoRicarda
7 months agoVirgie
8 months agoAnnmarie
8 months agoGolda
8 months agoFranchesca
8 months agoElliott
9 months agoBreana
9 months agoKing
9 months agoCarma
9 months agoJustine
11 months agoLoise
11 months agoStanton
12 months agoFrederica
1 year agoMaia
1 year agoCarolann
1 year agoWinfred
1 year agoTennie
1 year agoJoye
1 year agoSarina
1 year agoOctavio
1 year agoHermila
1 year agoCordelia
1 year agoStanton
1 year agoDetra
1 year agoMaynard
1 year agoDeangelo
1 year agoChristene
1 year agoGilma
2 years agoGwenn
2 years agoRonald
2 years agoShawn
2 years agoDonte
2 years agoAntonette
2 years agoSon
2 years agoDouglass
2 years agoAliza
2 years agoJavier
2 years agoShannon
2 years agoTheron
2 years agoKristofer
2 years agoLauna
2 years agoDerick
2 years agoVerdell
2 years agoFreida
2 years agoVesta
2 years agoLashaunda
2 years agoLon
2 years agoEric
2 years agoErasmo
2 years agoDierdre
2 years agoZack
2 years agosaqib
2 years agoanderson
2 years ago