Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Microsoft DP-420 Exam - Topic 4 Question 1 Discussion

Actual exam question for Microsoft's DP-420 exam
Question #: 1
Topic #: 4
[All DP-420 Questions]

You are implementing an Azure Data Factory data flow that will use an Azure Cosmos DB (SQL API) sink to write a dataset. The data flow will use 2,000 Apache Spark partitions.

You need to ensure that the ingestion from each Spark partition is balanced to optimize throughput.

Which sink setting should you configure?

Show Suggested Answer Hide Answer
Suggested Answer: C

Batch size: An integer that represents how many objects are being written to Cosmos DB collection in each batch. Usually, starting with the default batch size is sufficient. To further tune this value, note:

Cosmos DB limits single request's size to 2MB. The formula is 'Request Size = Single Document Size * Batch Size'. If you hit error saying 'Request size is too large', reduce the batch size value.

The larger the batch size, the better throughput the service can achieve, while make sure you allocate enough RUs to empower your workload.

Incorrect Answers:

A: Throughput: Set an optional value for the number of RUs you'd like to apply to your CosmosDB collection for each execution of this data flow. Minimum is 400.

B: Write throughput budget: An integer that represents the RUs you want to allocate for this Data Flow write operation, out of the total throughput allocated to the collection.

D: Collection action: Determines whether to recreate the destination collection prior to writing.

None: No action will be done to the collection.

Recreate: The collection will get dropped and recreated


Contribute your Thoughts:

0/2000 characters
Stephaine
4 months ago
I agree with Bette, B is the way to go!
upvoted 0 times
...
Nilsa
5 months ago
Wait, 2,000 partitions? That sounds excessive, right?
upvoted 0 times
...
Latosha
5 months ago
A seems too vague, not sure it would help much.
upvoted 0 times
...
Johnna
5 months ago
I think C, batch size is more important for throughput.
upvoted 0 times
...
Bette
5 months ago
Definitely go with B, write throughput budget is key for balancing.
upvoted 0 times
...
Dottie
5 months ago
I feel like "Collection action" could be relevant, but I can't recall how it specifically affects throughput in this context.
upvoted 0 times
...
Una
5 months ago
I remember practicing a similar question, and I think "Batch size" might be important for optimizing throughput as well.
upvoted 0 times
...
Beckie
5 months ago
I think we need to focus on the "Write throughput budget" to balance the ingestion from each partition, but I'm not entirely sure.
upvoted 0 times
...
Rosio
5 months ago
I'm leaning towards "Throughput" because it sounds like it directly relates to balancing the load, but I could be mixing it up with another topic.
upvoted 0 times
...
Sherly
5 months ago
This seems pretty straightforward. I think the correct answer is C - Play Estimated Wait Time. That's the most informative option for the customer.
upvoted 0 times
...
Kenneth
6 months ago
I'm a bit unsure about this one. There are a lot of factors to consider when establishing a group structure. I'll need to think through each option carefully to make sure I don't miss anything important.
upvoted 0 times
...
Kenneth
6 months ago
I'm a little confused by the different autoscaling options presented. I'll need to review the details of horizontal pod autoscaling, vertical pod autoscaling, and the cluster autoscaler to make sure I understand the differences and when to use each one.
upvoted 0 times
...
Cassi
6 months ago
A walkthrough seems like the best option to resolve any issues in the functional specification. That allows for a collaborative discussion and feedback from the team.
upvoted 0 times
...

Save Cancel