You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address
the problem. You need to ensure that the final output is generated as quickly as possible. What should you do?
Using Cloud Composer to design the processing pipeline as a Directed Acyclic Graph (DAG) is the most suitable approach because:
Fault tolerance: Cloud Composer (based on Apache Airflow) allows for handling failures at specific stages. You can clear the state of a failed task and rerun it without reprocessing the entire pipeline.
Stage-based processing: DAGs are ideal for workflows with interdependent stages where the output of one stage serves as input to the next.
Efficiency: This approach minimizes downtime and ensures that only failed stages are rerun, leading to faster final output generation.
Mammie
2 months agoTamesha
2 months agoJerrod
2 months agoMagdalene
3 months agoEttie
3 months agoBeula
3 months agoMillie
4 months agoErick
4 months agoMignon
4 months agoLayla
4 months agoDeonna
5 months agoKeneth
5 months agoJose
5 months agoAdelaide
5 months agoTasia
6 months agoCristal
2 months agoMariann
2 months agoTrina
3 months agoDick
3 months agoLayla
7 months agoMelvin
5 months agoRima
5 months agoBilly
7 months agoMila
7 months agoMica
5 months agoBo
5 months agoGarry
5 months agoJosphine
5 months agoNakita
7 months agoRolland
7 months agoLevi
7 months agoKristofer
7 months agoAvery
5 months agoLisandra
7 months ago