A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing.
Which AWS Glue feature should the data engineer use to meet this requirement?
Problem Analysis:
The pipeline processes compressed files in S3 and must support incremental data processing.
AWS Glue features must facilitate tracking progress to avoid reprocessing the same data.
Key Considerations:
Incremental data processing requires tracking which files or partitions have already been processed.
The solution must be automated and efficient for large-scale ETL jobs.
Solution Analysis:
Option A: Workflows
Workflows organize and orchestrate multiple Glue jobs but do not track progress for incremental data processing.
Option B: Triggers
Triggers initiate Glue jobs based on a schedule or events but do not track which data has been processed.
Option C: Job Bookmarks
Job bookmarks track the state of the data that has been processed, enabling incremental processing.
Automatically skip files or partitions that were previously processed in Glue jobs.
Option D: Classifiers
Classifiers determine the schema of incoming data but do not handle incremental processing.
Final Recommendation:
Job bookmarks are specifically designed to enable incremental data processing in AWS Glue ETL pipelines.
Elin
4 months agoMa
4 months agoSamira
4 months agoLai
4 months agoIraida
5 months agoAleta
5 months agoDudley
5 months agoJanna
5 months agoFrederick
5 months agoFrancoise
5 months agoOsvaldo
5 months agoPrecious
5 months agoYesenia
5 months agoGracia
12 months agoErnie
12 months agoMalcom
11 months agoErinn
11 months agoJohnetta
11 months agoCecilia
12 months agoMartina
12 months agoMauricio
12 months agoReyes
11 months agoJohnetta
11 months agoLorrine
12 months agoJulio
12 months agoTayna
1 year agoBenedict
11 months agoPura
11 months agoSimona
11 months ago