Google Exam Associate Data Practitioner Topic 2 Question 15 Discussion

Actual exam question for Google's Associate Data Practitioner exam

Question #: 15
Topic #: 2

[All Associate Data Practitioner Questions]

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address

the problem. You need to ensure that the final output is generated as quickly as possible. What should you do?

ADesign a Spark program that runs under Dataproc. Code the program to wait for user input when an error is detected. Rerun the last action after correcting any stage output data errors.

BDesign the pipeline as a set of PTransforms in Dataflow. Restart the pipeline after correcting any stage output data errors.

CDesign the workflow as a Cloud Workflow instance. Code the workflow to jump to a given stage based on an input parameter. Rerun the workflow after correcting any stage output data errors.

DDesign the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors.

Show Suggested Answer

Suggested Answer: D

Using Cloud Composer to design the processing pipeline as a Directed Acyclic Graph (DAG) is the most suitable approach because:

Fault tolerance: Cloud Composer (based on Apache Airflow) allows for handling failures at specific stages. You can clear the state of a failed task and rerun it without reprocessing the entire pipeline.

Stage-based processing: DAGs are ideal for workflows with interdependent stages where the output of one stage serves as input to the next.

Efficiency: This approach minimizes downtime and ensures that only failed stages are rerun, leading to faster final output generation.

by Gianna at Aug 01, 2025, 08:05 PM

Limited Time Offer

25%

Off

Get Premium Associate Data Practitioner Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Kristofer

6 days ago

Option B sounds like the way to go. Dataflow's ability to restart the pipeline after fixing errors seems like the most efficient approach.

upvoted 0 times

Lisandra

3 days ago

Option B sounds like the way to go. Dataflow's ability to restart the pipeline after fixing errors seems like the most efficient approach.

upvoted 0 times

...