Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam Amazon-DEA-C01 Topic 4 Question 21 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam
Question #: 21
Topic #: 4
[All Amazon-DEA-C01 Questions]

A company is migrating its database servers from Amazon EC2 instances that run Microsoft SQL Server to Amazon RDS for Microsoft SQL Server DB instances. The company's analytics team must export large data elements every day until the migration is complete. The data elements are the result of SQL joins across multiple tables. The data must be in Apache Parquet format. The analytics team must store the data in Amazon S3.

Which solution will meet these requirements in the MOST operationally efficient way?

Show Suggested Answer Hide Answer
Suggested Answer: A

Option A is the most operationally efficient way to meet the requirements because it minimizes the number of steps and services involved in the data export process. AWS Glue is a fully managed service that can extract, transform, and load (ETL) data from various sources to various destinations, including Amazon S3. AWS Glue can also convert data to different formats, such as Parquet, which is a columnar storage format that is optimized for analytics. By creating a view in the SQL Server databases that contains the required data elements, the AWS Glue job can select the data directly from the view without having to perform any joins or transformations on the source data. The AWS Glue job can then transfer the data in Parquet format to an S3 bucket and run on a daily schedule.

Option B is not operationally efficient because it involves multiple steps and services to export the data. SQL Server Agent is a tool that can run scheduled tasks on SQL Server databases, such as executing SQL queries. However, SQL Server Agent cannot directly export data to S3, so the query output must be saved as .csv objects on the EC2 instance. Then, an S3 event must be configured to trigger an AWS Lambda function that can transform the .csv objects to Parquet format and upload them to S3. This option adds complexity and latency to the data export process and requires additional resources and configuration.

Option C is not operationally efficient because it introduces an unnecessary step of running an AWS Glue crawler to read the view. An AWS Glue crawler is a service that can scan data sources and create metadata tables in the AWS Glue Data Catalog. The Data Catalog is a central repository that stores information about the data sources, such as schema, format, and location. However, in this scenario, the schema and format of the data elements are already known and fixed, so there is no need to run a crawler to discover them. The AWS Glue job can directly select the data from the view without using the Data Catalog. Running a crawler adds extra time and cost to the data export process.

Option D is not operationally efficient because it requires custom code and configuration to query the databases and transform the data. An AWS Lambda function is a service that can run code in response to events or triggers, such as Amazon EventBridge. Amazon EventBridge is a service that can connect applications and services with event sources, such as schedules, and route them to targets, such as Lambda functions. However, in this scenario, using a Lambda function to query the databases and transform the data is not the best option because it requires writing and maintaining code that uses JDBC to connect to the SQL Server databases, retrieve the required data, convert the data to Parquet format, and transfer the data to S3. This option also has limitations on the execution time, memory, and concurrency of the Lambda function, which may affect the performance and reliability of the data export process.


AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

AWS Glue Documentation

Working with Views in AWS Glue

Converting to Columnar Formats

Contribute your Thoughts:

Vincent
2 days ago
Hmm, I'm a bit confused by the different AWS services mentioned. I'll need to make sure I understand how they work together to solve this problem.
upvoted 0 times
...
Herman
8 days ago
This looks like a tricky one. I'll need to carefully read through the requirements and think through the pros and cons of each option.
upvoted 0 times
...
Germaine
13 days ago
Option D sounds interesting with the Lambda function, but I worry about the complexity of managing JDBC connections. I feel like it might not be the most efficient method compared to the Glue options.
upvoted 0 times
...
Rene
19 days ago
I practiced a similar question where we had to use AWS Glue jobs, and I think option C could work well too. The crawler part might help with schema detection, but I'm not sure if it's necessary here.
upvoted 0 times
...
Sharen
24 days ago
I'm not entirely sure, but I think using SQL Server Agent in option B might complicate things with the .csv to Parquet transformation. It feels like an extra step that could be avoided.
upvoted 0 times
...
Fairy
1 month ago
I remember that AWS Glue is often used for ETL tasks, so option A seems like a solid choice since it directly pulls from a view and outputs to S3 in Parquet format.
upvoted 0 times
...

Save Cancel