Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 2 Question 32 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 32
Topic #: 2
[All Databricks-Certified-Professional-Data-Engineer Questions]

A Data engineer wants to run unit's tests using common Python testing frameworks on python functions defined across several Databricks notebooks currently used in production.

How can the data engineer run unit tests against function that work with data in production?

Show Suggested Answer Hide Answer
Suggested Answer: A

The best practice for running unit tests on functions that interact with data is to use a dataset that closely mirrors the production data. This approach allows data engineers to validate the logic of their functions without the risk of affecting the actual production data. It's important to have a representative sample of production data to catch edge cases and ensure the functions will work correctly when used in a production environment.


Databricks Documentation on Testing: Testing and Validation of Data and Notebooks

Contribute your Thoughts:

Tamesha
27 days ago
Option D is the clear winner here. Separate the tests from the actual code - that's the professional way to do it. Unless, of course, you're a mad scientist who enjoys mixing it all together. *cue evil laughter*
upvoted 0 times
Rolande
4 days ago
User 2: Agreed, it's important to maintain clean and organized code.
upvoted 0 times
...
Jettie
8 days ago
User 1: Option D is definitely the way to go. Keep the tests separate from the code.
upvoted 0 times
...
...
Una
28 days ago
I think option C is the most practical solution.
upvoted 0 times
...
Thaddeus
28 days ago
Importing the unit test functions from a separate notebook is the way to go. Nice and modular, just like how I like my code.
upvoted 0 times
...
Jolene
29 days ago
I prefer option B, it's easier to manage.
upvoted 0 times
...
Hubert
1 months ago
I disagree, I believe option D is more efficient.
upvoted 0 times
...
Marci
1 months ago
Putting the unit tests in the same notebook as the functions? That just seems messy and hard to manage. Definitely going with option D.
upvoted 0 times
Cristen
10 days ago
Defining and unit testing functions using Files in Repos might provide a more organized way to manage the tests.
upvoted 0 times
...
Remedios
21 days ago
Running unit tests against non-production data that closely mirrors production could also be a good approach.
upvoted 0 times
...
Eden
23 days ago
I agree, putting unit tests in the same notebook can get messy. Option D seems like a better choice.
upvoted 0 times
...
...
Evan
1 months ago
I think option A is the best choice.
upvoted 0 times
...
Mayra
2 months ago
Defining functions in a repository and unit testing them there sounds like a good approach to me. Keeps things organized and maintainable.
upvoted 0 times
Glory
4 days ago
Defining functions in a repository and unit testing them there sounds like a good approach to me. Keeps things organized and maintainable.
upvoted 0 times
...
Andree
5 days ago
B) Define and unit test functions using Files in Repos
upvoted 0 times
...
Dion
17 days ago
A) Run unit tests against non-production data that closely mirrors production
upvoted 0 times
...
...
Sonia
2 months ago
Option A makes the most sense. Running tests against non-production data ensures we don't disrupt the actual production environment.
upvoted 0 times
Trevor
1 months ago
B) Define and unit test functions using Files in Repos
upvoted 0 times
...
Helga
1 months ago
That's a good point. We definitely don't want to mess with the production data.
upvoted 0 times
...
Carmela
1 months ago
A) Run unit tests against non-production data that closely mirrors production
upvoted 0 times
...
...

Save Cancel