You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?
Dataproc's Compatibility with Apache Spark:Dataproc is a managed service for running Hadoop and Spark clusters on Google Cloud. This means it is designed to seamlessly run Apache Spark jobs with minimal code changes. Your existing Spark jobs should run on Dataproc with little to no modification.
Cloud Storage as a Scalable Data Lake:Cloud Storage provides a highly scalable and durable storage solution for your data. It's designed to handle large volumes of data that Spark jobs typically process.
Minimizing Operational Overhead:By using Dataproc, you eliminate the need to manage and maintain a Hadoop cluster yourself. Google Cloud handles the infrastructure, allowing you to focus on your data processing tasks.
Tight Timeline and Minimal Code Changes:This option directly addresses the requirements of the question. It offers a quick and easy way to migrate your Spark jobs to Google Cloud with minimal disruption to your existing codebase.
Why other options are not suitable:
A . Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances:This option requires you to manage the underlying infrastructure yourself, which contradicts the requirement of using managed services.
C . Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach:While BigQuery is a powerful data warehouse, converting Spark scripts to SQL would require substantial code changes and might not be feasible within a tight timeline.
D . Rewrite your jobs in Apache Beam. Run your jobs in Dataflow:Rewriting jobs in Apache Beam would be a significant undertaking and not suitable for a quick migration with minimal code changes.
What is the HBase Shell for Cloud Bigtable?
The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables. The Cloud Bigtable HBase client for Java makes it possible to use the HBase shell to connect to Cloud Bigtable.
How can you get a neural network to learn about relationships between categories in a categorical feature?
There are two problems with one-hot encoding. First, it has high dimensionality, meaning that instead of having just one value, like a continuous feature, it has many values, or dimensions. This makes computation more time-consuming, especially if a feature has a very large number of categories. The second problem is that it doesn't encode any relationships between the categories. They are completely independent from each other, so the network has no way of knowing which ones are similar to each other.
Both of these problems can be solved by representing a categorical feature with an embedding
column. The idea is that each category has a smaller vector with, let's say, 5 values in it. But unlike a one-hot vector, the values are not usually 0. The values are weights, similar to the weights that are used for basic features in a neural network. The difference is that each category has a set of weights (5 of them in this case).
You can think of each value in the embedding vector as a feature of the category. So, if two categories are very similar to each other, then their embedding vectors should be very similar too.
Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of dat
a. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)
Frank Walker
3 days agoCarlene
22 days agoHoa
29 days agoIdella
1 month agoIdella
1 month agoClemencia
2 months agoDiane
2 months agoMelvin
2 months agoGregoria
2 months agoDiane
3 months agoMerlyn
3 months agoSharen
3 months agoLeota
3 months agoTrinidad
4 months agoLacresha
4 months agoTimmy
4 months agoLashaun
4 months agoOllie
5 months agoEdison
5 months agoTawna
5 months agoCoral
5 months agoBrendan
6 months agoRicarda
6 months agoVirgie
6 months agoAnnmarie
6 months agoGolda
7 months agoFranchesca
7 months agoElliott
7 months agoBreana
7 months agoKing
8 months agoCarma
8 months agoJustine
9 months agoLoise
10 months agoStanton
10 months agoFrederica
1 year agoMaia
1 year agoCarolann
1 year agoWinfred
1 year agoTennie
1 year agoJoye
1 year agoSarina
1 year agoOctavio
1 year agoHermila
1 year agoCordelia
1 year agoStanton
1 year agoDetra
1 year agoMaynard
1 year agoDeangelo
1 year agoChristene
1 year agoGilma
1 year agoGwenn
1 year agoRonald
1 year agoShawn
1 year agoDonte
1 year agoAntonette
1 year agoSon
1 year agoDouglass
1 year agoAliza
1 year agoJavier
2 years agoShannon
2 years agoTheron
2 years agoKristofer
2 years agoLauna
2 years agoDerick
2 years agoVerdell
2 years agoFreida
2 years agoVesta
2 years agoLashaunda
2 years agoLon
2 years agoEric
2 years agoErasmo
2 years agoDierdre
2 years agoZack
2 years agosaqib
2 years agoanderson
2 years ago