A media company needs a data pipeline that will ingest customer review data into a Snowflake table, and apply some transformations. The company also needs to use Amazon Comprehend to do sentiment analysis and make the de-identified final data set available publicly for advertising companies who use different cloud providers in different regions.
The data pipeline needs to run continuously and efficiently as new records arrive in the object storage leveraging event notifications. Also, the operational complexity, maintenance of the infrastructure, including platform upgrades and security, and the development effort should be minimal.
Which design will meet these requirements?
Option A is not the best design because it uses copy into to ingest the data, which is not as efficient and continuous as Snowpipe. Copy into is a SQL command that loads data from files into a table in a single transaction. It also exports the data into Amazon S3 to do model inference with Amazon Comprehend, which adds an extra step and increases the operational complexity and maintenance of the infrastructure.
Option C is not the best design because it uses Amazon EMR and PySpark to ingest and transform the data, which also increases the operational complexity and maintenance of the infrastructure. Amazon EMR is a cloud service that provides a managed Hadoop framework to process and analyze large-scale data sets. PySpark is a Python API for Spark, a distributed computing framework that can run on Hadoop. Option C also develops a python program to do model inference by leveraging the Amazon Comprehend text analysis API, which increases the development effort.
Option D is not the best design because it is identical to option A, except for the ingestion method. It still exports the data into Amazon S3 to do model inference with Amazon Comprehend, which adds an extra step and increases the operational complexity and maintenance of the infrastructure.
Mari
5 months agoKristine
6 months agoSalena
6 months agoMisty
6 months agoShawnda
6 months agoCordelia
6 months agoNickole
7 months agoAnna
7 months agoBreana
7 months agoJacinta
7 months agoTanesha
7 months agoJudy
8 months agoAmber
8 months agoCherry
10 months agoTheodora
10 months agoMila
10 months agoPaola
10 months agoYong
10 months agoTammara
10 months agoJustine
11 months agoEugene
9 months agoMing
10 months agoTammi
10 months agoDalene
11 months agoJacquelyne
11 months agoTanesha
11 months agoKimberlie
11 months agoTheodora
11 months agoDalene
11 months ago