Amazon Exam DBS-C01 Topic 4 Question 86 Discussion

Actual exam question for Amazon's DBS-C01 exam

Question #: 86
Topic #: 4

A database specialist is launching a test graph database using Amazon Neptune for the first time. The database specialist needs to insert millions of rows of test observations from a .csv file that is stored in Amazon S3. The database specialist has been using a series of API calls to upload the data to the Neptune DB instance.

Which combination of steps would allow the database specialist to upload the data faster? (Choose three.)

AEnsure Amazon Cognito returns the proper AWS STS tokens to authenticate the Neptune DB instance to the S3 bucket hosting the CSV file.

BEnsure the vertices and edges are specified in different .csv files with proper header column formatting.

CUse AWS DMS to move data from Amazon S3 to the Neptune Loader.

DCurl the S3 URI while inside the Neptune DB instance and then run the addVertex or addEdge commands.

EEnsure an IAM role for the Neptune DB instance is configured with the appropriate permissions to allow access to the file in the S3 bucket.

FCreate an S3 VPC endpoint and issue an HTTP POST to the database's loader endpoint.

Show Suggested Answer

Suggested Answer: B, E, F

Correct Answer: B, E, F

Explanation from Amazon documents:

To upload data faster to a Neptune DB instance from a .csv file stored in Amazon S3, the database specialist should use the Neptune Bulk Loader, which is a feature that allows you to load data from external files directly into a Neptune DB instance1. The Neptune Bulk Loader is faster and has less overhead than the API calls, such as SPARQL INSERT statements or Gremlin addV and addE steps2. The Neptune Bulk Loader supports both RDF and Gremlin data formats1.

To use the Neptune Bulk Loader, the database specialist needs to do the following13:

Ensure the vertices and edges are specified in different .csv files with proper header column formatting. This is required for the Gremlin data format, which uses two .csv files: one for vertices and one for edges. The first row of each file must contain the column names, which must match the property names of the graph elements. The files must also have a column named ~id for vertices and ~from and ~to for edges, which specify the unique identifiers of the graph elements1.

Ensure an IAM role for the Neptune DB instance is configured with the appropriate permissions to allow access to the file in the S3 bucket. This is required for the Neptune DB instance to read the data from the S3 bucket. The IAM role must have a trust policy that allows Neptune to assume the role, and a permissions policy that allows access to the S3 bucket and objects3.

Create an S3 VPC endpoint and issue an HTTP POST to the database's loader endpoint. This is required for the Neptune DB instance to communicate with the S3 bucket without going through the public internet. The S3 VPC endpoint must be in the same VPC as the Neptune DB instance. The HTTP POST request must specify the source parameter as the S3 URI of the .csv file, and optionally other parameters such as format, failOnError, parallelism, etc1.

Therefore, option B, E, and F are the correct steps to upload the data faster. Option A is not necessary because Amazon Cognito is not used for authenticating the Neptune DB instance to the S3 bucket. Option C is not suitable because AWS DMS is not designed for loading graph data into Neptune. Option D is not efficient because curling the S3 URI and running the addVertex or addEdge commands will be slower and more costly than using the Neptune Bulk Loader.

by Franklyn at Feb 08, 2024, 05:19 AM

Limited Time Offer

25%

1 years ago

Yeah, those seem like the logical choices. Separating the vertices and edges into different files could help with processing speed, and using AWS DMS to move the data directly would be faster than individual API calls.

upvoted 0 times

Latricia

1 years ago

C) Use AWS DMS to move data from Amazon S3 to the Neptune Loader.

upvoted 0 times

...

Renea

1 years ago

B) Ensure the vertices and edges are specified in different .csv files with proper header column formatting.

upvoted 0 times

...

Lelia

1 years ago

A) Ensure Amazon Cognito returns the proper AWS STS tokens to authenticate the Neptune DB instance to the S3 bucket hosting the CSV file.

upvoted 0 times

...

Omer

1 years ago

I agree. Let's see, the options talk about authentication, file formatting, and data transfer methods. I'm thinking options B, C, and E might be the way to go.

upvoted 0 times

...

Ena

1 years ago

Hmm, this question seems pretty straightforward. I think the key is to make the data upload process as efficient as possible.

upvoted 0 times

...