Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 4 Question 2 Discussion

Question

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 4 Question 2 Discussion

3 of 55. A data engineer observes that the upstream streaming source feeds the event table frequently and sends duplicate records. Upon analyzing the current production table, the data engineer found that the time difference in the event_timestamp column of the duplicate records is, at most, 30 minutes.To remove the duplicates, the engineer adds the code:df = df.withWatermark("event_timestamp", "30 minutes")What is the result?

A) It removes all duplicates regardless of when they arrive.

B) It accepts watermarks in seconds and the code results in an error.

D) It is not able to handle deduplication in this scenario.

Accepted Answer

C) It removes duplicates that arrive within the 30-minute window specified by the watermark.

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 4 Question 2 Discussion

Databricks Certified Associate Developer for Apache Spark 3.5 Exam - Topic 4 Question 2 Discussion

Contribute your Thoughts:

Joesph

Jacob

Sylvie

Ruthann

Stevie

Tandra

Tarra

Lou

Gayla

Noah

Laine

Sina

Chandra

Sabra

Joanne

Ernie

Laine

Harrison

Penney

Chantay

Agustin

Harrison

Dalene

Robt

Gracia

Tuyet

Leana

Mi

Jacquelyne