Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 2 Question 29 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 29
Topic #: 2
[All Databricks-Certified-Professional-Data-Engineer Questions]

In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both deep and shallow clone, development tables are created using shallow clone.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that vacuum was run the day before.

Why are the cloned tables no longer working?

Show Suggested Answer Hide Answer
Suggested Answer: C

In Delta Lake, a shallow clone creates a new table by copying the metadata of the source table without duplicating the data files. When the vacuum command is run on the source table, it removes old data files that are no longer needed to maintain the transactional log's integrity, potentially including files referenced by the shallow clone's metadata. If these files are purged, the shallow cloned tables will reference non-existent data files, causing them to stop working properly. This highlights the dependency of shallow clones on the source table's data files and the impact of data management operations like vacuum on these clones. Reference: Databricks documentation on Delta Lake, particularly the sections on cloning tables (shallow and deep cloning) and data retention with the vacuum command (https://docs.databricks.com/delta/index.html).


Contribute your Thoughts:

Eden
4 months ago
Or maybe we should consider using deep clone instead of shallow clone to prevent this issue in the future.
upvoted 0 times
...
Breana
4 months ago
That's a good point, Isidra. Maybe the solution is to run refresh on the cloned tables after vacuum is executed on the source tables.
upvoted 0 times
...
Isidra
4 months ago
But shouldn't running refresh on the cloned table pull in recent changes and fix the issue?
upvoted 0 times
...
Eden
4 months ago
I agree with Breana. The vacuum command probably caused the issue with the cloned tables.
upvoted 0 times
...
Breana
4 months ago
I think the cloned tables are no longer working because the metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command.
upvoted 0 times
...
Paulina
4 months ago
B is a good try, but C is the correct answer. Type 1 SCD changes shouldn't affect the cloned tables, the real issue is that the vacuum command wiped out the data files that the cloned metadata was pointing to. Cloning is like a game of hide and seek, you gotta make sure your data doesn't get lost in the vacuum.
upvoted 0 times
Devon
4 months ago
C) The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command
upvoted 0 times
...
Eden
4 months ago
A) The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.
upvoted 0 times
...
...
Pamella
4 months ago
Option D is the funniest one, but it's not the right answer. Shallow clones are like a house of cards - one gust of vacuum and they come tumbling down. Deep cloning is the way to go, it's like building a fortress to protect your data.
upvoted 0 times
...
Hyun
4 months ago
I agree with C. Vacuum is like a cosmic vacuum cleaner, sucking up all the old data files and leaving the clones high and dry. Shallow cloning is like running with scissors, it's all fun and games until someone loses an eye (or a functioning table).
upvoted 0 times
Temeka
4 months ago
C) The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command
upvoted 0 times
...
Patti
4 months ago
A) The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.
upvoted 0 times
...
...
Carol
5 months ago
The correct answer is C. The vacuum command purged the data files that the cloned metadata was referencing, causing the cloned tables to stop working. Shallow clones don't track changes to the source tables like deep clones do.
upvoted 0 times
Chantell
4 months ago
A) Oh, that makes sense. The vacuum command caused the issue by purging the data files the cloned tables were referencing.
upvoted 0 times
...
Evangelina
4 months ago
C) The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command
upvoted 0 times
...
Kattie
4 months ago
A) The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.
upvoted 0 times
...
...

Save Cancel