New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Data Engineer Professional Exam - Topic 5 Question 17 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 17
Topic #: 5
[All Databricks Certified Data Engineer Professional Questions]

A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.

One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

Show Suggested Answer Hide Answer
Suggested Answer: A

Maintaining data quality rules in a centralized Delta table allows for the reuse of these rules across multiple DLT (Delta Live Tables) pipelines. By storing these rules outside the pipeline's target schema and referencing the schema name as a pipeline parameter, the team can apply the same set of data quality checks to different tables within the pipeline. This approach ensures consistency in data quality validations and reduces redundancy in code by not having to replicate the same rules in each DLT notebook or file.


Databricks Documentation on Delta Live Tables: Delta Live Tables Guide

Contribute your Thoughts:

0/2000 characters
Dolores
3 months ago
I agree with A, it keeps things organized and efficient.
upvoted 0 times
...
Kanisha
3 months ago
Wait, can we really maintain rules in a separate notebook? That’s new to me!
upvoted 0 times
...
Daisy
3 months ago
C seems a bit complicated for just reusing rules.
upvoted 0 times
...
Selene
4 months ago
I think B could work too, but not sure if it's the best choice.
upvoted 0 times
...
Shala
4 months ago
Option A sounds like a solid plan for reusability!
upvoted 0 times
...
Lavonna
4 months ago
Option D seems like a good idea since maintaining rules in a separate notebook could keep things organized, but I wonder if it complicates the workflow too much.
upvoted 0 times
...
Graciela
4 months ago
I feel like option C could work, but I’m struggling to recall if external jobs can really access the pipeline configuration files as needed.
upvoted 0 times
...
Edmond
4 months ago
I remember practicing a question about using global variables, so option B might be the way to go, but I’m not confident about the scope of those variables.
upvoted 0 times
...
Becky
5 months ago
I think option A sounds familiar, but I'm not entirely sure how to implement it with the pipeline parameters.
upvoted 0 times
...
Pamella
5 months ago
Option C seems interesting, but I'm not sure how practical it would be to add constraints through an external job. That could add unnecessary complexity to the pipeline. I'm leaning more towards option A or D.
upvoted 0 times
...
Leatha
5 months ago
Hmm, I'm a bit confused by this question. I'm not sure if using global Python variables (option B) is the best approach, as that might make the code harder to maintain. I'll need to think this through a bit more.
upvoted 0 times
...
Margarett
5 months ago
This seems like a straightforward question about reusing data quality rules across a DLT pipeline. I think option A is the best approach, as it allows us to maintain the rules in a separate Delta table that can be referenced by the pipeline.
upvoted 0 times
...
Loreen
5 months ago
I feel pretty confident about this one. Maintaining the data quality rules in a separate Databricks notebook (option D) seems like a clean and modular approach. That way, we can easily update the rules without having to modify the pipeline code.
upvoted 0 times
...
Elenore
5 months ago
I'm leaning towards the Database.queryLocator method. It seems like the best way to preserve the entire result set while improving performance.
upvoted 0 times
...
Luke
1 year ago
I'm just picturing the team arguing over which option is best, like a bunch of data ninjas fighting over the perfect data quality kata.
upvoted 0 times
Fernanda
1 year ago
D) Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.
upvoted 0 times
...
Anglea
1 year ago
A) Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.
upvoted 0 times
...
...
Alba
1 year ago
Option D is the one for me! Keeping the data quality rules in a separate notebook is like a data engineer's version of 'Keep Calm and Carry On.'
upvoted 0 times
Sharita
1 year ago
I agree, having a separate notebook for data quality rules makes it easier to manage.
upvoted 0 times
...
Nieves
1 year ago
Option D is a good choice. It helps keep things organized.
upvoted 0 times
...
...
Zachary
2 years ago
Using global Python variables (option B) feels a bit hacky. I'd prefer a more structured approach like option A or D.
upvoted 0 times
Lashawnda
1 year ago
D) Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.
upvoted 0 times
...
Leanora
1 year ago
A) Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.
upvoted 0 times
...
...
Jose
2 years ago
I agree with Val, option A seems like the most practical solution.
upvoted 0 times
...
Doug
2 years ago
I'm feeling option C. Adding constraints through an external job with access to the pipeline config seems like a robust solution.
upvoted 0 times
Kandis
1 year ago
Let's go with option C then, it seems like the most practical approach.
upvoted 0 times
...
Becky
1 year ago
It would definitely streamline the process and make it easier to manage.
upvoted 0 times
...
Dominque
1 year ago
I agree, having an external job handle the constraints seems efficient.
upvoted 0 times
...
Delbert
1 year ago
Option C sounds like a good idea. It would centralize the data quality rules.
upvoted 0 times
...
...
Val
2 years ago
But with option A, we can easily maintain and update the data quality rules.
upvoted 0 times
...
Mike
2 years ago
I disagree, I believe option D would be more efficient.
upvoted 0 times
...
Val
2 years ago
I think option A is the best approach.
upvoted 0 times
...
Antonio
2 years ago
Option A is the way to go! Maintaining data quality rules in a separate Delta table is a clean and organized approach.
upvoted 0 times
Lennie
2 years ago
That sounds like a smart solution to ensure consistency and efficiency in the data quality checks.
upvoted 0 times
...
Detra
2 years ago
I agree, it would make it easier to manage and update the data quality rules for all tables in the pipeline.
upvoted 0 times
...
Tequila
2 years ago
Option A is the way to go! Maintaining data quality rules in a separate Delta table is a clean and organized approach.
upvoted 0 times
...
...

Save Cancel