New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Data Engineer Professional Exam - Topic 2 Question 29 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 29
Topic #: 2
[All Databricks Certified Data Engineer Professional Questions]

In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both deep and shallow clone, development tables are created using shallow clone.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that vacuum was run the day before.

Why are the cloned tables no longer working?

Show Suggested Answer Hide Answer
Suggested Answer: C

In Delta Lake, a shallow clone creates a new table by copying the metadata of the source table without duplicating the data files. When the vacuum command is run on the source table, it removes old data files that are no longer needed to maintain the transactional log's integrity, potentially including files referenced by the shallow clone's metadata. If these files are purged, the shallow cloned tables will reference non-existent data files, causing them to stop working properly. This highlights the dependency of shallow clones on the source table's data files and the impact of data management operations like vacuum on these clones. Reference: Databricks documentation on Delta Lake, particularly the sections on cloning tables (shallow and deep cloning) and data retention with the vacuum command (https://docs.databricks.com/delta/index.html).


Contribute your Thoughts:

0/2000 characters
Francesco
3 months ago
Wow, I didn't know vacuum could mess up clones like that!
upvoted 0 times
...
Glendora
3 months ago
B sounds plausible too, but I lean towards C.
upvoted 0 times
...
Annice
3 months ago
Wait, I thought shallow clones were safe?
upvoted 0 times
...
Augustine
4 months ago
Totally agree, C is the right answer!
upvoted 0 times
...
Chaya
4 months ago
Cloned tables lose track of data files after vacuum.
upvoted 0 times
...
Rodolfo
4 months ago
I recall that shallow clones don't track changes like deep clones do, but I'm not confident if vacuuming affects them as much as D suggests.
upvoted 0 times
...
Lili
4 months ago
I thought that Type 1 SCDs just overwrite records, so I'm not sure how that relates to the cloning issue. Could it be B?
upvoted 0 times
...
Haydee
4 months ago
This seems similar to a practice question we did about metadata and how it interacts with data files after vacuuming. I think C might be the right answer.
upvoted 0 times
...
Shannan
5 months ago
I remember discussing how vacuuming affects Delta Lake tables, but I'm not entirely sure if it invalidates shallow clones directly.
upvoted 0 times
...
Rana
5 months ago
I think the answer is C - the metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command. The shallow clone doesn't fully capture the table state, so when vacuum runs, it breaks the cloned table.
upvoted 0 times
...
Dottie
5 months ago
Okay, let's break this down step-by-step. The key seems to be understanding how shallow cloning works and how that can be impacted by vacuum. I'll focus on that angle.
upvoted 0 times
...
Gregoria
5 months ago
Hmm, I'm a bit confused on the details of how Delta Lake handles cloning and vacuum operations. I'll need to review the documentation to make sure I understand the key concepts.
upvoted 0 times
...
Marylou
5 months ago
This seems like a tricky one. I'll need to think through the implications of shallow cloning and how that interacts with the vacuum process.
upvoted 0 times
...
Eden
1 year ago
Or maybe we should consider using deep clone instead of shallow clone to prevent this issue in the future.
upvoted 0 times
...
Breana
1 year ago
That's a good point, Isidra. Maybe the solution is to run refresh on the cloned tables after vacuum is executed on the source tables.
upvoted 0 times
...
Isidra
1 year ago
But shouldn't running refresh on the cloned table pull in recent changes and fix the issue?
upvoted 0 times
...
Eden
1 year ago
I agree with Breana. The vacuum command probably caused the issue with the cloned tables.
upvoted 0 times
...
Breana
1 year ago
I think the cloned tables are no longer working because the metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command.
upvoted 0 times
...
Paulina
1 year ago
B is a good try, but C is the correct answer. Type 1 SCD changes shouldn't affect the cloned tables, the real issue is that the vacuum command wiped out the data files that the cloned metadata was pointing to. Cloning is like a game of hide and seek, you gotta make sure your data doesn't get lost in the vacuum.
upvoted 0 times
Devon
1 year ago
C) The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command
upvoted 0 times
...
Eden
1 year ago
A) The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.
upvoted 0 times
...
...
Pamella
1 year ago
Option D is the funniest one, but it's not the right answer. Shallow clones are like a house of cards - one gust of vacuum and they come tumbling down. Deep cloning is the way to go, it's like building a fortress to protect your data.
upvoted 0 times
...
Hyun
1 year ago
I agree with C. Vacuum is like a cosmic vacuum cleaner, sucking up all the old data files and leaving the clones high and dry. Shallow cloning is like running with scissors, it's all fun and games until someone loses an eye (or a functioning table).
upvoted 0 times
Temeka
1 year ago
C) The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command
upvoted 0 times
...
Patti
1 year ago
A) The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.
upvoted 0 times
...
...
Carol
1 year ago
The correct answer is C. The vacuum command purged the data files that the cloned metadata was referencing, causing the cloned tables to stop working. Shallow clones don't track changes to the source tables like deep clones do.
upvoted 0 times
Chantell
1 year ago
A) Oh, that makes sense. The vacuum command caused the issue by purging the data files the cloned tables were referencing.
upvoted 0 times
...
Evangelina
1 year ago
C) The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command
upvoted 0 times
...
Kattie
1 year ago
A) The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.
upvoted 0 times
...
...

Save Cancel