Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Certified Data Engineer Professional Exam - Topic 1 Question 51 Discussion

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.Streaming DataFrame df has the following schema:"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"Code block:Choose the response that correctly fills in the blank within the code block to complete this task.
B) window('event_time', '5 minutes').alias('time')
A) to_interval('event_time', '5 minutes').alias('time')
C) 'event_time'
D) window('event_time', '10 minutes').alias('time')
E) lag('event_time', '10 minutes').alias('time')

Databricks Certified Data Engineer Professional Exam - Topic 1 Question 51 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 51
Topic #: 1
[All Databricks Certified Data Engineer Professional Questions]

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

Show Suggested Answer Hide Answer
Suggested Answer: B

This is the correct answer because the window function is used to group streaming data by time intervals. The window function takes two arguments: a time column and a window duration. The window duration specifies how long each window is, and must be a multiple of 1 second. In this case, the window duration is ''5 minutes'', which means each window will cover a non-overlapping five-minute interval. The window function also returns a struct column with two fields: start and end, which represent the start and end time of each window. The alias function is used to rename the struct column as ''time''. Verified Reference: [Databricks Certified Data Engineer Professional], under ''Structured Streaming'' section;Databricks Documentation, under ''WINDOW'' section. https://www.databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html


Contribute your Thoughts:

0/2000 characters

Currently there are no comments in this discussion, be the first to comment!


Save Cancel