Databricks Exam Databricks Certified Data Engineer Professional Topic 5 Question 17 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam

Question #: 17
Topic #: 5

[All Databricks Certified Data Engineer Professional Questions]

A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.

One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

AMaintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

BUse global Python variables to make expectations visible across DLT notebooks included in the same pipeline.

CAdd data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.

DMaintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

Show Suggested Answer

Suggested Answer: A

Maintaining data quality rules in a centralized Delta table allows for the reuse of these rules across multiple DLT (Delta Live Tables) pipelines. By storing these rules outside the pipeline's target schema and referencing the schema name as a pipeline parameter, the team can apply the same set of data quality checks to different tables within the pipeline. This approach ensures consistency in data quality validations and reduces redundancy in code by not having to replicate the same rules in each DLT notebook or file.

Databricks Documentation on Delta Live Tables: Delta Live Tables Guide

by Ona at Aug 05, 2024, 07:42 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Data Engineer Professional Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Luke

10 months ago

I'm just picturing the team arguing over which option is best, like a bunch of data ninjas fighting over the perfect data quality kata.

upvoted 0 times

Fernanda

9 months ago

D) Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

upvoted 0 times

...

Anglea

9 months ago

A) Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

upvoted 0 times

...

Alba

10 months ago

Option D is the one for me! Keeping the data quality rules in a separate notebook is like a data engineer's version of 'Keep Calm and Carry On.'

upvoted 0 times

Sharita

9 months ago

I agree, having a separate notebook for data quality rules makes it easier to manage.

upvoted 0 times

...

Nieves

9 months ago

Option D is a good choice. It helps keep things organized.

upvoted 0 times

...

Zachary

10 months ago

Using global Python variables (option B) feels a bit hacky. I'd prefer a more structured approach like option A or D.

upvoted 0 times

Lashawnda

9 months ago

D) Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

upvoted 0 times

...

Leanora

9 months ago

A) Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

upvoted 0 times

...

Jose

10 months ago

I agree with Val, option A seems like the most practical solution.

upvoted 0 times

...

Doug

10 months ago

I'm feeling option C. Adding constraints through an external job with access to the pipeline config seems like a robust solution.

upvoted 0 times

Kandis

9 months ago

Let's go with option C then, it seems like the most practical approach.

upvoted 0 times

...

Becky

9 months ago

It would definitely streamline the process and make it easier to manage.

upvoted 0 times

...

Dominque

9 months ago

I agree, having an external job handle the constraints seems efficient.

upvoted 0 times

...

Delbert

9 months ago

Option C sounds like a good idea. It would centralize the data quality rules.

upvoted 0 times

...

Val

10 months ago

But with option A, we can easily maintain and update the data quality rules.

upvoted 0 times

...

Mike

10 months ago

I disagree, I believe option D would be more efficient.

upvoted 0 times

...

Val

10 months ago

I think option A is the best approach.

upvoted 0 times

...

Antonio

11 months ago

Option A is the way to go! Maintaining data quality rules in a separate Delta table is a clean and organized approach.

upvoted 0 times

Lennie

10 months ago

That sounds like a smart solution to ensure consistency and efficiency in the data quality checks.

upvoted 0 times

...

Detra

10 months ago

I agree, it would make it easier to manage and update the data quality rules for all tables in the pipeline.

upvoted 0 times

...

Tequila

10 months ago

Option A is the way to go! Maintaining data quality rules in a separate Delta table is a clean and organized approach.

upvoted 0 times

...