Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Data Engineer Professional Exam

Certification Provider: Databricks
Exam Name: Databricks Certified Data Engineer Professional
Number of questions in our database: 120
Exam Version: Apr. 17, 2024
Exam Official Topics:
  • Topic 1: Databricks Tooling: The Databricks Tooling topic encompasses the various features and functionalities of Delta Lake. This includes understanding the transaction log, Optimistic Concurrency Control, Delta clone, indexing optimizations, and strategies for partitioning data for optimal performance in the Databricks SQL service.
  • Topic 2: Data Processing: The topic covers understanding partition hints, partitioning data effectively, controlling part-file sizes, updating records, leveraging Structured Streaming and Delta Lake, implementing stream-static joins and deduplication. Additionally, it delves into utilizing Change Data Capture, and addressing performance issues related to small files.
  • Topic 3: Data Modeling: It focuses on understanding the objectives of data transformations, using Change Data Feed, applying Delta Lake cloning, designing multiplex bronze tables. Lastly it discusses implementing incremental processing and data quality enforcement, implementing lookup tables, and implementing Slowly Changing Dimension tables, and implementing SCD Type 0, 1, and 2 tables.
  • Topic 4: Security & Governance: It discusses creating Dynamic views to accomplishing data masking and using dynamic views to control access to rows and columns.
  • Topic 5: Monitoring & Logging: This topic includes understanding the Spark UI, inspecting event timelines and metrics, drawing conclusions from various UIs, designing systems to control cost and latency SLAs for production streaming jobs, and deploying and monitoring both streaming and batch jobs.
  • Topic 6: Testing & Deployment: It discusses adapting notebook dependencies to use Python file dependencies, leveraging Wheels for imports, repairing and rerunning failed jobs, creating jobs based on common use cases, designing systems to control cost and latency SLAs, configuring the Databricks CLI, and using the REST API to clone a job, trigger a run, and export the run output.
Disscuss Databricks Databricks Certified Data Engineer Professional Topics, Questions or Ask Anything Related

Currently there are no comments in this discussion, be the first to comment!

Free Databricks Databricks Certified Data Engineer Professional Exam Actual Questions

The questions for Databricks Certified Data Engineer Professional were last updated On Apr. 17, 2024

Question #1

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address <> customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

Reveal Solution Hide Solution
Question #2

A data engineer is testing a collection of mathematical functions, one of which calculates the area under a curve as described by another function.

Which kind of the test does the above line exemplify?

Reveal Solution Hide Solution
Correct Answer: B

A unit test is designed to verify the correctness of a small, isolated piece of code, typically a single function. Testing a mathematical function that calculates the area under a curve is an example of a unit test because it is testing a specific, individual function to ensure it operates as expected.


Software Testing Fundamentals: Unit Testing

Question #3

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address <> customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

Reveal Solution Hide Solution
Question #4

The marketing team is looking to share data in an aggregate table with the sales organization, but the field names used by the teams do not match, and a number of marketing specific fields have not been approval for the sales org.

Which of the following solutions addresses the situation while emphasizing simplicity?

Reveal Solution Hide Solution
Correct Answer: A

Creating a view is a straightforward solution that can address the need for field name standardization and selective field sharing between departments. A view allows for presenting a transformed version of the underlying data without duplicating it. In this scenario, the view would only include the approved fields for the sales team and rename any fields as per their naming conventions.


Databricks documentation on using SQL views in Delta Lake: https://docs.databricks.com/delta/quick-start.html#sql-views

Question #5

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

Reveal Solution Hide Solution
Correct Answer: A

In Databricks and Delta Lake, transactions are indeed ACID-compliant, but this compliance is limited to single table transactions. Delta Lake does not inherently enforce foreign key constraints, which are a staple in relational database systems for maintaining referential integrity between tables. This means that when migrating workloads from a relational database system to Databricks Lakehouse, engineers need to reconsider how to maintain data integrity and relationships that were previously enforced by foreign key constraints. Unlike traditional relational databases where foreign key constraints help in maintaining the consistency across tables, in Databricks Lakehouse, the data engineer has to manage data consistency and integrity at the application level or through careful design of ETL processes. Reference:

Databricks Documentation on Delta Lake: Delta Lake Guide

Databricks Documentation on ACID Transactions in Delta Lake: ACID Transactions in Delta Lake



Unlock all Databricks Certified Data Engineer Professional Exam Questions with Advanced Practice Test Features:
  • Select Question Types you want
  • Set your Desired Pass Percentage
  • Allocate Time (Hours : Minutes)
  • Create Multiple Practice tests with Limited Questions
  • Customer Support
Get Full Access Now

Save Cancel