A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.
Streaming DataFrame df has the following schema:
"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"
Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.
This is the correct answer because the window function is used to group streaming data by time intervals. The window function takes two arguments: a time column and a window duration. The window duration specifies how long each window is, and must be a multiple of 1 second. In this case, the window duration is ''5 minutes'', which means each window will cover a non-overlapping five-minute interval. The window function also returns a struct column with two fields: start and end, which represent the start and end time of each window. The alias function is used to rename the struct column as ''time''. Verified Reference: [Databricks Certified Data Engineer Professional], under ''Structured Streaming'' section;Databricks Documentation, under ''WINDOW'' section. https://www.databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html
Which statement describes integration testing?
This is the correct answer because it describes integration testing. Integration testing is a type of testing that validates interactions between subsystems of your application, such as modules, components, or services. Integration testing ensures that the subsystems work together as expected and produce the correct outputs or results. Integration testing can be done at different levels of granularity, such as component integration testing, system integration testing, or end-to-end testing. Integration testing can help detect errors or bugs that may not be found by unit testing, which only validates behavior of individual elements of your application. Verified Reference: [Databricks Certified Data Engineer Professional], under ''Testing'' section; Databricks Documentation, under ''Integration testing'' section.
The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of bronze, silver, and gold tables. Bronze tables will almost exclusively be used by production data engineering workloads, while silver tables will be used to support both data engineering and machine learning workloads. Gold tables will largely serve business intelligence and reporting purposes. While personal identifying information (PII) exists in all tiers of data, pseudonymization and anonymization rules are in place for all data at the silver and gold levels.
The organization is interested in reducing security concerns while maximizing the ability to collaborate across diverse teams.
Which statement exemplifies best practices for implementing this system?
This is the correct answer because it exemplifies best practices for implementing this system. By isolating tables in separate databases based on data quality tiers, such as bronze, silver, and gold, the data engineering team can achieve several benefits. First, they can easily manage permissions for different users and groups through database ACLs, which allow granting or revoking access to databases, tables, or views. Second, they can physically separate the default storage locations for managed tables in each database, which can improve performance and reduce costs. Third, they can provide a clear and consistent naming convention for the tables in each database, which can improve discoverability and usability. Verified Reference: [Databricks Certified Data Engineer Professional], under ''Lakehouse'' section; Databricks Documentation, under ''Database object privileges'' section.
A data engineering team has a time-consuming data ingestion job with three data sources. Each notebook takes about one hour to load new data. One day, the job fails because a notebook update introduced a new required configuration parameter. The team must quickly fix the issue and load the latest data from the failing source.
Which action should the team take?
The repair run capability in Databricks Jobs allows re-execution of failed tasks without re-running successful ones. When a parameterized job fails due to missing or incorrect task configuration, engineers can perform a repair run to fix inputs or parameters and resume from the failed state.
This approach saves time, reduces cost, and ensures workflow continuity by avoiding unnecessary recomputation. Additionally, updating the task definition with the missing parameter prevents future runs from failing.
Running the job manually (B) loses run context; (C) alone does not prevent recurrence; (D) delays resolution. Thus, A follows the correct operational and recovery practice.
The data engineer team has been tasked with configured connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user group already created in Databricks that represent various teams within the company.
A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.
Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?
In Databricks, using the Secrets module allows for secure management of sensitive information such as database credentials. Granting 'Read' permissions on a secret key that maps to database credentials for a specific team ensures that only members of that team can access these credentials. This approach aligns with the principle of least privilege, granting users the minimum level of access required to perform their jobs, thus enhancing security.
Databricks Documentation on Secret Management: Secrets
Margaret Stewart
4 days agoData Processing Johnson
21 days agoEmily Campbell
1 month agoData Modeling Johnson
2 months agoData Processing Garcia
1 month agoDatabricks Tooling Thompson
17 days agoSteven Adams
2 months agoMichael Flores
2 months agoRyan Bell
2 months agoCrystal Brown
1 month agoGary Walker
2 months agoDonald Collins
2 months agoAlishia
3 months agoDulce
3 months agoPearlie
3 months agoEvelynn
4 months agoYoko
4 months agoLouvenia
4 months agoGlory
4 months agoMona
5 months agoMattie
5 months agoLavonda
5 months agoAntonio
5 months agoBillye
6 months agoRosio
6 months agoKimbery
6 months agoNoe
6 months agoSharen
7 months agoMitsue
7 months agoLacresha
7 months agoDomitila
7 months agoCassi
8 months agoChau
8 months agoNadine
8 months agoSharee
8 months agoNiesha
9 months agoMary
9 months agoMing
9 months agoDante
9 months agoMargot
10 months agoLindsey
10 months agoRyan
10 months agoFernanda
12 months agoStacey
1 year agoRosann
1 year agoMarti
1 year agoEllen
1 year agoEmmett
1 year agoCherry
1 year agoAlana
1 year agoJovita
1 year agoBeatriz
1 year agoLeslie
1 year agoMichael
1 year agoLaurena
1 year agoRemedios
1 year agoDana
1 year agoBrittni
1 year agoLaurel
1 year agoNidia
1 year agoLezlie
2 years agoDana
2 years agoRenato
2 years agoYaeko
2 years agoDean
2 years agoSon
2 years agoAlex
2 years agoEffie
2 years agoMaybelle
2 years agoStefany
2 years agoHeike
2 years agoGearldine
2 years agoMisty
2 years agoCharlesetta
2 years agoAlesia
2 years agoAretha
2 years agoGary
2 years agoMozell
2 years agoSharen
2 years agoIsabella
2 years agoSheridan
2 years agoAdolph
2 years agoJaime
2 years agoElmira
2 years agoJesusita
2 years agoRichelle
2 years agoDenny
2 years agoAlysa
2 years agoHerman
2 years agoThad
2 years ago