Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon BDS-C00 Exam - Topic 3 Question 113 Discussion

A customer has a machine learning workflow that consist of multiple quick cycles of reads-writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles.How should the customer accomplish this?
B) Use AWS Data Pipeline to orchestrate the data processing cycles
A) Turn on EMRFS consistent view when configuring the EMR cluster
C) Set Hadoop.data.consistency = true in the core-site.xml file
D) Set Hadoop.s3.consistency = true in the core-site.xml file

Amazon BDS-C00 Exam - Topic 3 Question 113 Discussion

Actual exam question for Amazon's BDS-C00 exam
Question #: 113
Topic #: 3
[All BDS-C00 Questions]

A customer has a machine learning workflow that consist of multiple quick cycles of reads-writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles.

How should the customer accomplish this?

Show Suggested Answer Hide Answer
Suggested Answer: B

Contribute your Thoughts:

0/2000 characters
Maia
6 months ago
C and D seem a bit outdated, are they even relevant anymore?
upvoted 0 times
...
Carma
6 months ago
Definitely A, I've seen it work in similar setups.
upvoted 0 times
...
Brunilda
7 months ago
Wait, does turning on EMRFS really solve the data consistency issue?
upvoted 0 times
...
Wilford
7 months ago
I think B could work too, but not sure it's the best option.
upvoted 0 times
...
Arlette
7 months ago
A is the way to go for consistent reads!
upvoted 0 times
...
Tiera
7 months ago
I vaguely recall something about AWS Data Pipeline, but I don't think it directly addresses the consistency issue like EMRFS does.
upvoted 0 times
...
Casandra
7 months ago
I practiced a similar question where we had to ensure data consistency in S3. I think option A is the best choice based on that.
upvoted 0 times
...
Pamella
8 months ago
I'm not entirely sure, but I think setting Hadoop.data.consistency to true might be related to data consistency. I need to double-check that.
upvoted 0 times
...
Lynna
8 months ago
I remember reading about EMRFS consistent view in our study materials. It seems like it could help with ensuring data consistency across cycles.
upvoted 0 times
...
Princess
8 months ago
I'm not too familiar with AWS Data Pipeline, but that option seems like it might be overkill for this use case. The EMRFS consistent view seems like the more direct solution.
upvoted 0 times
...
Jarod
8 months ago
Okay, I've got this. The customer needs to ensure that the reads in subsequent cycles see the new data from the prior cycles. Turning on EMRFS consistent view is the way to go here.
upvoted 0 times
...
Noel
8 months ago
Hmm, I'm a bit confused about the difference between the Hadoop configuration options mentioned in the choices. I'll need to review the documentation on those to make sure I understand which one is the right solution.
upvoted 0 times
...
Jules
8 months ago
This seems like a straightforward question about ensuring data consistency in an EMR workflow. I think the key is to understand the EMRFS consistent view feature.
upvoted 0 times
...
Burma
1 year ago
Option A - the 'turn it on and forget it' approach. Way better than option D, the 'make up config settings as you go' approach.
upvoted 0 times
Glen
12 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
Freida
12 months ago
B) Use AWS Data Pipeline to orchestrate the data processing cycles
upvoted 0 times
...
Erasmo
1 year ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
...
Claudia
1 year ago
Hadoop.s3.consistency = true? That's a new one to me. I'd stick with the tried and true option A.
upvoted 0 times
Margurite
12 months ago
Definitely, it's always best to go with the recommended option for data consistency.
upvoted 0 times
...
Huey
12 months ago
I agree, turning on EMRFS consistent view should help prevent missing new data in subsequent cycles.
upvoted 0 times
...
Lashunda
12 months ago
Option A sounds like the best choice to ensure consistency in your data processing cycles.
upvoted 0 times
...
...
Olene
1 year ago
Setting Hadoop.data.consistency = true might work, but I'm not sure if that applies specifically to S3 data. Option A is probably safer.
upvoted 0 times
...
Amber
1 year ago
AWS Data Pipeline could work, but that adds an extra layer of complexity. I'd go with the simpler option A.
upvoted 0 times
Margurite
12 months ago
I think option A might be more straightforward for the customer.
upvoted 0 times
...
Evan
1 year ago
C) Set Hadoop.data.consistency = true in the core-site.xml file
upvoted 0 times
...
Gaynell
1 year ago
That sounds like a good idea, it should help with the data consistency.
upvoted 0 times
...
Deane
1 year ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
...
Kaycee
1 year ago
Option A seems like the most straightforward approach. Consistent view should help ensure the reads in subsequent cycles see the latest data.
upvoted 0 times
Thurman
1 year ago
I agree, consistency is key for the machine learning process to work effectively.
upvoted 0 times
...
Larae
1 year ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
Diego
1 year ago
Yes, that should help with ensuring the machine learning workflow sees the latest data.
upvoted 0 times
...
Carla
1 year ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
Bettye
1 year ago
That sounds like a good idea to make sure the reads are consistent.
upvoted 0 times
...
Gracia
1 year ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
...
Terry
1 year ago
I'm not sure, but I think option B) Use AWS Data Pipeline could also help in orchestrating the data processing cycles efficiently.
upvoted 0 times
...
Roxane
1 year ago
I agree with Elenora. EMRFS consistent view ensures that the subsequent cycles will not miss new data.
upvoted 0 times
...
Elenora
1 year ago
I think the customer should choose option A) Turn on EMRFS consistent view when configuring the EMR cluster.
upvoted 0 times
...

Save Cancel