Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon BDS-C00 Exam - Topic 3 Question 113 Discussion

Actual exam question for Amazon's BDS-C00 exam
Question #: 113
Topic #: 3
[All BDS-C00 Questions]

A customer has a machine learning workflow that consist of multiple quick cycles of reads-writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles.

How should the customer accomplish this?

Show Suggested Answer Hide Answer
Suggested Answer: B

Contribute your Thoughts:

0/2000 characters
Maia
4 months ago
C and D seem a bit outdated, are they even relevant anymore?
upvoted 0 times
...
Carma
4 months ago
Definitely A, I've seen it work in similar setups.
upvoted 0 times
...
Brunilda
4 months ago
Wait, does turning on EMRFS really solve the data consistency issue?
upvoted 0 times
...
Wilford
5 months ago
I think B could work too, but not sure it's the best option.
upvoted 0 times
...
Arlette
5 months ago
A is the way to go for consistent reads!
upvoted 0 times
...
Tiera
5 months ago
I vaguely recall something about AWS Data Pipeline, but I don't think it directly addresses the consistency issue like EMRFS does.
upvoted 0 times
...
Casandra
5 months ago
I practiced a similar question where we had to ensure data consistency in S3. I think option A is the best choice based on that.
upvoted 0 times
...
Pamella
5 months ago
I'm not entirely sure, but I think setting Hadoop.data.consistency to true might be related to data consistency. I need to double-check that.
upvoted 0 times
...
Lynna
6 months ago
I remember reading about EMRFS consistent view in our study materials. It seems like it could help with ensuring data consistency across cycles.
upvoted 0 times
...
Princess
6 months ago
I'm not too familiar with AWS Data Pipeline, but that option seems like it might be overkill for this use case. The EMRFS consistent view seems like the more direct solution.
upvoted 0 times
...
Jarod
6 months ago
Okay, I've got this. The customer needs to ensure that the reads in subsequent cycles see the new data from the prior cycles. Turning on EMRFS consistent view is the way to go here.
upvoted 0 times
...
Noel
6 months ago
Hmm, I'm a bit confused about the difference between the Hadoop configuration options mentioned in the choices. I'll need to review the documentation on those to make sure I understand which one is the right solution.
upvoted 0 times
...
Jules
6 months ago
This seems like a straightforward question about ensuring data consistency in an EMR workflow. I think the key is to understand the EMRFS consistent view feature.
upvoted 0 times
...
Burma
10 months ago
Option A - the 'turn it on and forget it' approach. Way better than option D, the 'make up config settings as you go' approach.
upvoted 0 times
Glen
9 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
Freida
9 months ago
B) Use AWS Data Pipeline to orchestrate the data processing cycles
upvoted 0 times
...
Erasmo
10 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
...
Claudia
11 months ago
Hadoop.s3.consistency = true? That's a new one to me. I'd stick with the tried and true option A.
upvoted 0 times
Margurite
9 months ago
Definitely, it's always best to go with the recommended option for data consistency.
upvoted 0 times
...
Huey
9 months ago
I agree, turning on EMRFS consistent view should help prevent missing new data in subsequent cycles.
upvoted 0 times
...
Lashunda
9 months ago
Option A sounds like the best choice to ensure consistency in your data processing cycles.
upvoted 0 times
...
...
Olene
11 months ago
Setting Hadoop.data.consistency = true might work, but I'm not sure if that applies specifically to S3 data. Option A is probably safer.
upvoted 0 times
...
Amber
11 months ago
AWS Data Pipeline could work, but that adds an extra layer of complexity. I'd go with the simpler option A.
upvoted 0 times
Margurite
10 months ago
I think option A might be more straightforward for the customer.
upvoted 0 times
...
Evan
10 months ago
C) Set Hadoop.data.consistency = true in the core-site.xml file
upvoted 0 times
...
Gaynell
11 months ago
That sounds like a good idea, it should help with the data consistency.
upvoted 0 times
...
Deane
11 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
...
Kaycee
12 months ago
Option A seems like the most straightforward approach. Consistent view should help ensure the reads in subsequent cycles see the latest data.
upvoted 0 times
Thurman
10 months ago
I agree, consistency is key for the machine learning process to work effectively.
upvoted 0 times
...
Larae
10 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
Diego
10 months ago
Yes, that should help with ensuring the machine learning workflow sees the latest data.
upvoted 0 times
...
Carla
10 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
Bettye
11 months ago
That sounds like a good idea to make sure the reads are consistent.
upvoted 0 times
...
Gracia
11 months ago
A) Turn on EMRFS consistent view when configuring the EMR cluster
upvoted 0 times
...
...
Terry
12 months ago
I'm not sure, but I think option B) Use AWS Data Pipeline could also help in orchestrating the data processing cycles efficiently.
upvoted 0 times
...
Roxane
12 months ago
I agree with Elenora. EMRFS consistent view ensures that the subsequent cycles will not miss new data.
upvoted 0 times
...
Elenora
12 months ago
I think the customer should choose option A) Turn on EMRFS consistent view when configuring the EMR cluster.
upvoted 0 times
...

Save Cancel