Google Professional Machine Learning Engineer Exam - Topic 1 Question 105 Discussion

Question

Google Professional Machine Learning Engineer Exam - Topic 1 Question 105 Discussion

You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?

A) Remove the data transformation step from your pipeline.

B) Containerize the PySpark transformation step, and add it to your pipeline.

D) Deploy Apache Spark at a separate node pool in a Google Kubernetes Engine cluster. Add a ContainerOp to your pipeline that invokes a corresponding transformation job for this Spark instance.

Accepted Answer

C) Add a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage.

Google Professional Machine Learning Engineer Exam - Topic 1 Question 105 Discussion

Google Professional Machine Learning Engineer Exam - Topic 1 Question 105 Discussion

Contribute your Thoughts:

Lettie

Keshia

Tenesha

Elliot

Nenita

Theodora

Rossana

Catalina

Irma

Felicitas

Inocencia

Carey

Inocencia

Alaine

Bronwyn

Bettina

Tarra

William

Ashleigh

Stefania

Nan

Jolene

Jin