A data engineering team has a time-consuming data ingestion job with three data sources. Each notebook takes about one hour to load new data. One day, the job fails because a notebook update introduced a new required configuration parameter. The team must quickly fix the issue and load the latest data from the failing source.
Which action should the team take?
The repair run capability in Databricks Jobs allows re-execution of failed tasks without re-running successful ones. When a parameterized job fails due to missing or incorrect task configuration, engineers can perform a repair run to fix inputs or parameters and resume from the failed state.
This approach saves time, reduces cost, and ensures workflow continuity by avoiding unnecessary recomputation. Additionally, updating the task definition with the missing parameter prevents future runs from failing.
Running the job manually (B) loses run context; (C) alone does not prevent recurrence; (D) delays resolution. Thus, A follows the correct operational and recovery practice.
The data engineer team has been tasked with configured connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user group already created in Databricks that represent various teams within the company.
A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.
Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?
In Databricks, using the Secrets module allows for secure management of sensitive information such as database credentials. Granting 'Read' permissions on a secret key that maps to database credentials for a specific team ensures that only members of that team can access these credentials. This approach aligns with the principle of least privilege, granting users the minimum level of access required to perform their jobs, thus enhancing security.
Databricks Documentation on Secret Management: Secrets
Which statement describes Delta Lake Auto Compaction?
This is the correct answer because it describes the behavior of Delta Lake Auto Compaction, which is a feature that automatically optimizes the layout of Delta Lake tables by coalescing small files into larger ones. Auto Compaction runs as an asynchronous job after a write to a table has succeeded and checks if files within a partition can be further compacted. If yes, it runs an optimize job with a default target file size of 128 MB. Auto Compaction only compacts files that have not been compacted previously. Verified Reference: [Databricks Certified Data Engineer Professional], under ''Delta Lake'' section; Databricks Documentation, under ''Auto Compaction for Delta Lake on Databricks'' section.
'Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven't been compacted previously.'
https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size
A Data Engineer is building a simple data pipeline using Lakeflow Declarative Pipelines (LDP) in Databricks to ingest customer data. The raw customer data is stored in a cloud storage location in JSON format. The task is to create Lakeflow Declarative Pipelines that read the raw JSON data and write it into a Delta table for further processing.
Which code snippet will correctly ingest the raw JSON data and create a Delta table using LDP?
A.
import dlt
@dlt.table
def raw_customers():
return spark.read.format("csv").load("s3://my-bucket/raw-customers/")
B.
import dlt
@dlt.table
def raw_customers():
return spark.read.json("s3://my-bucket/raw-customers/")
C.
import dlt
@dlt.table
def raw_customers():
return spark.read.format("parquet").load("s3://my-bucket/raw-customers/")
D.
import dlt
@dlt.view
def raw_customers():
return spark.format.json("s3://my-bucket/raw-customers/")
The correct method to define a table using Lakeflow Declarative Pipelines (LDP) is with the @dlt.table decorator, which persists the output as a managed Delta table. When ingesting raw JSON data, spark.read.json() or spark.read.format('json').load() is the standard approach. This reads JSON-formatted files from the source and stores them in Delta format automatically managed by Databricks.
Reference Source: Databricks Lakeflow Declarative Pipelines Developer Guide -- ''Create tables from raw JSON and Delta sources.''
When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM's resources?
Data Modeling Johnson
1 day agoSteven Adams
13 days agoMichael Flores
7 days agoRyan Bell
2 hours agoGary Walker
12 days agoDonald Collins
12 days agoAlishia
1 month agoDulce
1 month agoPearlie
2 months agoEvelynn
2 months agoYoko
2 months agoLouvenia
3 months agoGlory
3 months agoMona
3 months agoMattie
3 months agoLavonda
4 months agoAntonio
4 months agoBillye
4 months agoRosio
4 months agoKimbery
5 months agoNoe
5 months agoSharen
5 months agoMitsue
5 months agoLacresha
6 months agoDomitila
6 months agoCassi
6 months agoChau
6 months agoNadine
7 months agoSharee
7 months agoNiesha
7 months agoMary
7 months agoMing
8 months agoDante
8 months agoMargot
8 months agoLindsey
8 months agoRyan
8 months agoFernanda
10 months agoStacey
10 months agoRosann
11 months agoMarti
11 months agoEllen
12 months agoEmmett
12 months agoCherry
1 year agoAlana
1 year agoJovita
1 year agoBeatriz
1 year agoLeslie
1 year agoMichael
1 year agoLaurena
1 year agoRemedios
1 year agoDana
1 year agoBrittni
1 year agoLaurel
1 year agoNidia
1 year agoLezlie
1 year agoDana
1 year agoRenato
1 year agoYaeko
1 year agoDean
1 year agoSon
1 year agoAlex
1 year agoEffie
1 year agoMaybelle
1 year agoStefany
2 years agoHeike
2 years agoGearldine
2 years agoMisty
2 years agoCharlesetta
2 years agoAlesia
2 years agoAretha
2 years agoGary
2 years agoMozell
2 years agoSharen
2 years agoIsabella
2 years agoSheridan
2 years agoAdolph
2 years agoJaime
2 years agoElmira
2 years agoJesusita
2 years agoRichelle
2 years agoDenny
2 years agoAlysa
2 years agoHerman
2 years agoThad
2 years ago