Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon-DEA-C01 Exam - Topic 3 Question 18 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam
Question #: 18
Topic #: 3
[All Amazon-DEA-C01 Questions]

A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The first subsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage on AWS. The third subsidiary uses Google BigQuery.

The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to use Apache Iceberg as the table format.

A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by using each source engine, join the data, and write the data to Iceberg.

Which solution will meet these requirements with the LEAST operational effort?

Show Suggested Answer Hide Answer
Suggested Answer: B

Amazon Athena provides federated query connectors that allow querying multiple data sources, such as Amazon Redshift, Teradata, and Google BigQuery, without needing to extract the data from the original source. This solution is optimal because it offers the least operational effort by avoiding complex data movement and transformation processes.

Amazon Athena Federated Queries:

Athena's federated queries allow direct querying of data stored across multiple sources, including Amazon Redshift, Teradata, and BigQuery. With Athena's support for Apache Iceberg, the company can easily run a Merge operation on the Iceberg table.

The solution reduces complexity by centralizing the query execution and transformation process in Athena using SQL queries.


Alternatives Considered:

A (AWS Glue pipeline): This would work but requires more operational effort to manage and transform the data in AWS Glue.

C (Amazon EMR): Using EMR and writing PySpark code introduces more operational overhead and complexity compared to a SQL-based solution in Athena.

D (Amazon AppFlow): AppFlow is more suitable for transferring data between services but is not as efficient for transformations and joins as Athena federated queries.

Amazon Athena Documentation

Federated Queries in Amazon Athena

Contribute your Thoughts:

0/2000 characters
Josephine
3 months ago
Not sure about that, EMR could add unnecessary complexity.
upvoted 0 times
...
Lindsey
3 months ago
Wait, can you really use all those connectors together? Seems tricky.
upvoted 0 times
...
Jani
3 months ago
Sounds like AWS Glue is the way to go for less hassle!
upvoted 0 times
...
Sang
3 months ago
I think using Athena might be more efficient for querying.
upvoted 0 times
...
Devon
4 months ago
I agree, Glue seems like the best option here!
upvoted 0 times
...
Dick
4 months ago
I feel like Appflow could be a good option, but I’m uncertain about how well it integrates with Iceberg for the merge operation.
upvoted 0 times
...
Mila
4 months ago
I practiced a similar question where we had to use EMR and Spark, but it felt like it required more operational effort than Glue or Athena.
upvoted 0 times
...
Roy
4 months ago
I think using Amazon Athena with federated queries could simplify things, but I can't recall if it supports all those data sources seamlessly.
upvoted 0 times
...
Daniel
4 months ago
I remember studying about AWS Glue and its connectors, but I'm not sure if it's the best choice for this scenario.
upvoted 0 times
...
Susana
5 months ago
I like the idea of using Amazon EMR and PySpark to handle the data transformations. That gives me more flexibility, but I'll need to make sure the operational overhead is still low.
upvoted 0 times
...
German
5 months ago
The Athena federated query approach seems interesting, but I'm not sure how that would compare to the other options in terms of operational effort. I'll need to research that a bit more.
upvoted 0 times
...
Katina
5 months ago
Okay, I think I've got a good handle on this. I'll focus on using the native connectors for each data source to simplify the pipeline setup and maintenance.
upvoted 0 times
...
Joseph
5 months ago
Hmm, I'm a bit confused by all the different data sources and tools mentioned. I'll need to read through the question again and make sure I understand the requirements.
upvoted 0 times
...
Odette
5 months ago
This looks like a tricky one. I'll need to carefully consider the different options and their trade-offs in terms of operational effort.
upvoted 0 times
...
Aron
6 months ago
Wow, this is a real brain-teaser! I'm going to have to think about this one over my lunch break. *munches on a big, juicy data burger*
upvoted 0 times
Margot
6 months ago
I bet option A looks the easiest!
upvoted 0 times
...
Dannie
6 months ago
That burger must be a data cruncher!
upvoted 0 times
...
Cary
6 months ago
But what about the flexibility of B?
upvoted 0 times
...
Brock
6 months ago
C sounds like a heavy lift, I'd skip that.
upvoted 0 times
...
...
Helga
7 months ago
Option A seems like the simplest solution, but I'm not sure if Glue's transforms can handle the complexity of joining data from those three different sources.
upvoted 0 times
...
Arthur
7 months ago
Option C looks good to me. Using EMR and PySpark gives you more flexibility and control over the transformations.
upvoted 0 times
Ellsworth
6 months ago
Option C looks good to me. Using EMR and PySpark gives you more flexibility and control over the transformations.
upvoted 0 times
...
...
Caprice
7 months ago
I think Option B is the way to go. Using Athena's federated query capabilities would make the pipeline really easy to set up and maintain.
upvoted 0 times
Myrtie
6 months ago
It would definitely reduce operational effort compared to writing code in PySpark with Apache Spark.
upvoted 0 times
...
Jeannetta
6 months ago
I agree, using SQL queries in Athena to read from all data sources and join the data sounds efficient.
upvoted 0 times
...
Lore
7 months ago
Option B is a good choice. Athena's federated query connectors can simplify the pipeline setup.
upvoted 0 times
...
...
Micah
8 months ago
I'm leaning towards option D. Using Amazon Appflow for data writing and Athena for joining seems efficient to me.
upvoted 0 times
...
Luann
8 months ago
I disagree, I believe option C is better. Using Amazon EMR with PySpark gives more flexibility in data transformations.
upvoted 0 times
...
Herminia
8 months ago
I think option A is the best choice. Using native connectors in AWS Glue seems like the most straightforward approach.
upvoted 0 times
...

Save Cancel