New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon-DEA-C01 Exam - Topic 2 Question 7 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam
Question #: 7
Topic #: 2
[All Amazon-DEA-C01 Questions]

A company uses AWS Glue Data Catalog to index data that is uploaded to an Amazon S3 bucket every day. The company uses a daily batch processes in an extract, transform, and load (ETL) pipeline to upload data from external sources into the S3 bucket.

The company runs a daily report on the S3 dat

a. Some days, the company runs the report before all the daily data has been uploaded to the S3 bucket. A data engineer must be able to send a message that identifies any incomplete data to an existing Amazon Simple Notification Service (Amazon SNS) topic.

Which solution will meet this requirement with the LEAST operational overhead?

Show Suggested Answer Hide Answer
Suggested Answer: C

AWS Glue workflows are designed to orchestrate the ETL pipeline, and you can create data quality checks to ensure the uploaded datasets are complete before running reports. If there is an issue with the data, AWS Glue workflows can trigger an Amazon EventBridge event that sends a message to an SNS topic.

AWS Glue Workflows:

AWS Glue workflows allow users to automate and monitor complex ETL processes. You can include data quality actions to check for null values, data types, and other consistency checks.

In the event of incomplete data, an EventBridge event can be generated to notify via SNS.


Alternatives Considered:

A (Airflow cluster): Managed Airflow introduces more operational overhead and complexity compared to Glue workflows.

B (EMR cluster): Setting up an EMR cluster is also more complex compared to the Glue-centric solution.

D (Lambda functions): While Lambda functions can work, using Glue workflows offers a more integrated and lower operational overhead solution.

AWS Glue Workflow Documentation

Contribute your Thoughts:

0/2000 characters
Leota
2 months ago
C is solid, EventBridge is a game changer for event-driven tasks.
upvoted 0 times
...
Leontine
2 months ago
Surprised that no one mentioned using Airflow!
upvoted 0 times
...
Dorethea
3 months ago
Not sure if Lambda can handle the data volume effectively.
upvoted 0 times
...
Chantay
3 months ago
I disagree, D looks more efficient with Lambda functions.
upvoted 0 times
...
Mayra
3 months ago
Option C seems like the best choice for low overhead.
upvoted 0 times
...
Yaeko
3 months ago
I recall that using AWS Step Functions can help orchestrate workflows, but I’m not sure if it’s necessary for this scenario. Maybe the Glue workflows are simpler?
upvoted 0 times
...
Nickole
4 months ago
I’m a bit confused about whether to use Lambda or EMR for the data quality checks. Both seem like they could work, but I’m leaning towards Lambda for less overhead.
upvoted 0 times
...
James
4 months ago
I think we practiced a similar question where we had to send notifications for incomplete data. I feel like using EventBridge might be the right approach.
upvoted 0 times
...
Kati
4 months ago
I remember we discussed using AWS Glue workflows for data quality checks, but I'm not sure if that's the best option here.
upvoted 0 times
...
Lennie
4 months ago
I'm leaning towards Option A with the Airflow cluster. I've used Airflow before and I'm familiar with how to set up data quality checks and notifications. The fact that it's a managed service is also appealing - less overhead for me to worry about. As long as the Airflow setup isn't too complex, I think this could be a solid choice.
upvoted 0 times
...
Titus
4 months ago
Option D with the Lambda functions and Step Functions workflow looks interesting. I like the idea of keeping things simple and modular. The email notification to the SNS topic also seems like a nice way to alert the data engineer. I'll have to think through the implementation details, but this could be a good solution.
upvoted 0 times
...
Tiera
4 months ago
I'm a bit confused by all the different AWS services mentioned in the options. Airflow, EMR, Step Functions - that's a lot of moving parts! I'm not sure I fully understand how they all fit together. I might need to do some more research on the capabilities of each service before deciding.
upvoted 0 times
...
Estrella
5 months ago
This looks like a pretty straightforward data quality monitoring problem. I think Option C is the way to go - using AWS Glue workflows to check the data and trigger an EventBridge event if there are any issues. That seems like the most efficient and low-overhead solution.
upvoted 0 times
...
Johana
1 year ago
That's a good point, option C does seem like a simpler solution with less operational overhead.
upvoted 0 times
...
Jennifer
1 year ago
I'm just hoping the data engineer has a good sense of humor. Imagine getting that SNS notification every time there's a data hiccup - 'Houston, we have a problem... and it's in the cloud!'
upvoted 0 times
Jamika
1 year ago
D: It's important to have a sense of humor when dealing with data hiccups in the cloud.
upvoted 0 times
...
Geraldo
1 year ago
C: I can only imagine the data engineer's reaction every time they get that notification.
upvoted 0 times
...
Ivory
1 year ago
B: Haha, 'Houston, we have a problem in the cloud' - that's a good one!
upvoted 0 times
...
Gene
1 year ago
A: I know right, that SNS notification would definitely keep things interesting!
upvoted 0 times
...
...
Micheal
1 year ago
Ah, the age-old debate - Airflow or EMR? Option A and B both have their merits, but I'm just glad I don't have to make that call. As long as it works, I'm happy!
upvoted 0 times
...
Nguyet
1 year ago
I like the idea of using Lambda functions in Option D, but orchestrating the whole thing through Step Functions feels a bit overkill. Maybe a simpler Lambda-based solution could work just as well.
upvoted 0 times
Earlean
1 year ago
Lucina: That's a good point, a simpler Lambda-based solution could still meet the requirement with less overhead.
upvoted 0 times
...
Lucina
1 year ago
User 2: I agree, but maybe we can simplify it by just using Lambda functions without Step Functions.
upvoted 0 times
...
Paola
1 year ago
User 1: Option D sounds good, using Lambda functions for data quality checks is efficient.
upvoted 0 times
...
...
Brett
1 year ago
I disagree, I believe option C is the most efficient as it uses AWS Glue workflows and EventBridge to handle data quality checks.
upvoted 0 times
...
Johana
1 year ago
I think option A is the best choice because it uses Apache Airflow to run data quality checks and send notifications.
upvoted 0 times
...
Sheridan
1 year ago
Option C seems the most straightforward. Integrating the data quality checks directly into the Glue workflow and using EventBridge to trigger the notification is a nice clean solution.
upvoted 0 times
Billye
1 year ago
Option C definitely has the least operational overhead.
upvoted 0 times
...
Anjelica
1 year ago
Using EventBridge to trigger the notification is a smart move.
upvoted 0 times
...
Phyliss
1 year ago
I agree, integrating the data quality checks into the Glue workflow seems efficient.
upvoted 0 times
...
Margurite
1 year ago
I think option C is the best choice here.
upvoted 0 times
...
...

Save Cancel