Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon-DEA-C01 Exam - Topic 1 Question 28 Discussion

A company processes 500 GB of audience and advertising data daily, storing CSV files in Amazon S3 with schemas registered in AWS Glue Data Catalog. They need to convert these files to Apache Parquet format and store them in an S3 bucket.The solution requires a long-running workflow with 15 GiB memory capacity to process the data concurrently, followed by a correlation process that begins only after the first two processes complete.
C) Use AWS Glue workflows to run the first two processes in parallel. Ensure that the third process starts after the first two processes have finished.
A) Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the workflow by using AWS Glue. Configure AWS Glue to begin the third process after the first two processes have finished.
B) Use Amazon EMR to run each process in the workflow. Create an Amazon Simple Queue Service (Amazon SQS) queue to handle messages that indicate the completion of the first two processes. Configure an AWS Lambda function to process the SQS queue by running the third process.
D) Use AWS Step Functions to orchestrate a workflow that uses multiple AWS Lambda functions. Ensure that the third process starts after the first two processes have finished.

Amazon-DEA-C01 Exam - Topic 1 Question 28 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam
Question #: 28
Topic #: 1
[All Amazon-DEA-C01 Questions]

A company processes 500 GB of audience and advertising data daily, storing CSV files in Amazon S3 with schemas registered in AWS Glue Data Catalog. They need to convert these files to Apache Parquet format and store them in an S3 bucket.

The solution requires a long-running workflow with 15 GiB memory capacity to process the data concurrently, followed by a correlation process that begins only after the first two processes complete.

Show Suggested Answer Hide Answer
Suggested Answer: C

AWS Glue Workflows can coordinate multiple ETL jobs and triggers. They support parallel execution and sequential dependencies, which is ideal for concurrent data processing followed by correlation steps, all with minimal operational overhead.

''Use AWS Glue Workflows to orchestrate multiple ETL jobs in sequence or in parallel, supporting conditional triggers and dependency management.''

-- Ace the AWS Certified Data Engineer - Associate Certification - version 2 - apple.pdf


Contribute your Thoughts:

0/2000 characters
Moon
1 month ago
I practiced a similar question where we had to manage dependencies between processes. I think option B with SQS and Lambda could work, but it seems a bit complex for this scenario.
upvoted 0 times
...
Naomi
1 month ago
I'm not entirely sure about using Amazon MWAA for this. I feel like it might be overkill for just orchestrating a few processes, but I could be wrong.
upvoted 0 times
...
Marti
1 month ago
I remember studying AWS Glue workflows, and I think option C sounds familiar since it directly mentions running processes in parallel.
upvoted 0 times
...

Save Cancel