Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Data Engineer Professional Exam - Topic 5 Question 45 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 45
Topic #: 5
[All Databricks Certified Data Engineer Professional Questions]

A Data Engineer is building a simple data pipeline using Lakeflow Declarative Pipelines (LDP) in Databricks to ingest customer data. The raw customer data is stored in a cloud storage location in JSON format. The task is to create Lakeflow Declarative Pipelines that read the raw JSON data and write it into a Delta table for further processing.

Which code snippet will correctly ingest the raw JSON data and create a Delta table using LDP?

A.

import dlt

@dlt.table

def raw_customers():

return spark.read.format("csv").load("s3://my-bucket/raw-customers/")

B.

import dlt

@dlt.table

def raw_customers():

return spark.read.json("s3://my-bucket/raw-customers/")

C.

import dlt

@dlt.table

def raw_customers():

return spark.read.format("parquet").load("s3://my-bucket/raw-customers/")

D.

import dlt

@dlt.view

def raw_customers():

return spark.format.json("s3://my-bucket/raw-customers/")

Show Suggested Answer Hide Answer
Suggested Answer: B

The correct method to define a table using Lakeflow Declarative Pipelines (LDP) is with the @dlt.table decorator, which persists the output as a managed Delta table. When ingesting raw JSON data, spark.read.json() or spark.read.format('json').load() is the standard approach. This reads JSON-formatted files from the source and stores them in Delta format automatically managed by Databricks.

Reference Source: Databricks Lakeflow Declarative Pipelines Developer Guide -- ''Create tables from raw JSON and Delta sources.''


Contribute your Thoughts:

0/2000 characters
Samira
3 days ago
Definitely going with Option B! Makes total sense.
upvoted 0 times
...
Fannie
8 days ago
I think Option A is a mistake. CSV format won't work here.
upvoted 0 times
...
Curt
13 days ago
Option B is the right choice! It reads JSON correctly.
upvoted 0 times
...
Stephania
18 days ago
I’m a bit confused about D; it uses `@dlt.view` instead of `@dlt.table`. I think that might not be the right approach for creating a Delta table.
upvoted 0 times
...
Gwenn
24 days ago
I practiced a similar question where we had to specify the format correctly. I feel like B is the most straightforward choice here.
upvoted 0 times
...
Toi
29 days ago
I'm not entirely sure, but I remember something about needing to use the correct format method. Option A uses CSV, which doesn't seem right.
upvoted 0 times
...
Lavera
1 month ago
I think the answer might be B since it specifically uses `spark.read.json`, which matches the JSON format of the data.
upvoted 0 times
...

Save Cancel