Databricks Exam Databricks Certified Data Engineer Professional Topic 2 Question 36 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam

Question #: 36
Topic #: 2

[All Databricks Certified Data Engineer Professional Questions]

A Delta Lake table representing metadata about content posts from users has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

This table is partitioned by the date column. A query is run with the following filter:

longitude < 20 & longitude > -20

Which statement describes how data will be filtered?

AStatistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.

BNo file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.

CThe Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.

DStatistics in the Delta Log will be used to identify data files that might include records in the filtered range.

EThe Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.

Show Suggested Answer

Suggested Answer: D

This is the correct answer because it describes how data will be filtered when a query is run with the following filter: longitude < 20 & longitude > -20. The query is run on a Delta Lake table that has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE. This table is partitioned by the date column. When a query is run on a partitioned Delta Lake table, Delta Lake uses statistics in the Delta Log to identify data files that might include records in the filtered range. The statistics include information such as min and max values for each column in each data file. By using these statistics, Delta Lake can skip reading data files that do not match the filter condition, which can improve query performance and reduce I/O costs. Verified Reference: [Databricks Certified Data Engineer Professional], under ''Delta Lake'' section; Databricks Documentation, under ''Data skipping'' section.

by Elli at Jul 03, 2025, 11:29 PM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Data Engineer Professional Questions as Interactive Web-Based Practice Test or PDF