Databricks Machine Learning Professional Exam - Topic 1 Question 20 Discussion

Actual exam question for Databricks's Databricks Machine Learning Professional exam

Question #: 20
Topic #: 1

[All Databricks Machine Learning Professional Questions]

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

ASpark UDFs

B[Structured Streaming

CMLflow
D Delta Lake

EAutoML

Show Suggested Answer

Suggested Answer: C

by Magda at Jul 07, 2024, 12:06 PM

Limited Time Offer

25%

Off

Get Premium Databricks Machine Learning Professional Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Virgina

7 months ago

Totally with you on Structured Streaming! It's the best option here.

upvoted 0 times

...

Ashlee

8 months ago

Wait, can AutoML even handle continuous data prep? Sounds off.

upvoted 0 times

...

Omega

8 months ago

Spark UDFs are more for batch processing, not continuous.

upvoted 0 times

...

Dolores

8 months ago

I think Delta Lake could work too, right?

upvoted 0 times

...

Ulysses

8 months ago

Definitely Structured Streaming for continuous processing!

upvoted 0 times

...

Sanda

9 months ago

AutoML seems off for this question; I don't think it handles the continuous pipeline aspect we're looking for.

upvoted 0 times

...

Paulina

9 months ago

Delta Lake sounds familiar, but I can't recall if it specifically supports continuous data preparation.

upvoted 0 times

...

Willodean

9 months ago

I think Structured Streaming is the right choice here. We practiced a similar question about streaming data in our last session.

upvoted 0 times

...

Frederick

9 months ago

I remember we discussed Spark UDFs in class, but I'm not sure if they are meant for continuous processing like this.

upvoted 0 times

...

Charolette

9 months ago

MLflow could be useful for managing the overall machine learning workflow, but I don't think it directly addresses the continuous data processing requirement. I'll need to think about how MLflow might fit into a broader solution for this problem.

upvoted 0 times

...

Chaya

9 months ago

Delta Lake could be a good fit for the data storage and management aspect of this problem, but I'm not sure if it provides the continuous processing capabilities on its own. I'll need to investigate how Delta Lake integrates with other tools like Structured Streaming.

upvoted 0 times

...

Shanda

9 months ago

AutoML seems like an interesting choice, but I'm not sure if it would provide the level of control and customization needed for a continuous data processing pipeline. I'll have to research the capabilities of AutoML more.

upvoted 0 times

...

Katie

9 months ago

Hmm, I'm not too familiar with Structured Streaming. I wonder if Spark UDFs could also be a viable solution for this problem. I'll need to look into the capabilities of both options.

upvoted 0 times

...

Viva

9 months ago

This looks like a question about building a continuous data processing pipeline. I think Structured Streaming would be a good option to consider since it's designed for this type of use case.

upvoted 0 times

...

Tayna

9 months ago

Ah, I remember this now. The vgextend command is used to add new physical volumes to an existing volume group. That's the answer I'm going with.

upvoted 0 times

...

Venita

9 months ago

Yeah, I feel like this situation was in one of our practice case studies. The key is to find the right balance between following regulations and the urgency of the council's request.

upvoted 0 times

...

Chery

10 months ago

I'm pretty sure the output should include both the numbers and their corresponding string representations, but I'm confused about how they'd be ordered.

upvoted 0 times

...

Ling

1 year ago

Hey, I heard MLflow is the new hotness for machine learning ops. Maybe they can just throw some emojis at the data and it'll magically get processed.

upvoted 0 times

Maryann

1 year ago

D: I've heard Delta Lake is good for managing large amounts of data efficiently.

upvoted 0 times

...

Marge

1 year ago

C: Spark UDFs might also be useful for data processing.

upvoted 0 times

...

Bernardine

1 year ago

B: Yeah, MLflow is great for managing the machine learning lifecycle.

upvoted 0 times

...

Gertude

1 year ago

A: I think MLflow could definitely help with that.

upvoted 0 times

...

Mari

1 year ago

Spark UDFs? That's just for extending Spark's functionality, not for continuous data processing. I'm with the others - Structured Streaming is the way to go.

upvoted 0 times

...

Marci

1 year ago

AutoML? Really? That's for automating the machine learning model development process, not data preprocessing. I'd say Structured Streaming is the clear winner here.

upvoted 0 times

Julie

1 year ago

Spark UDFs might work well for custom data processing functions within the pipeline.

upvoted 0 times

...

Helaine

1 year ago

I think MLflow could also be useful for tracking and managing the machine learning pipeline.

upvoted 0 times

...

Tijuana

1 year ago

I agree, Structured Streaming is the best choice for continuous processing.

upvoted 0 times

...

Sabina

1 year ago

Delta Lake is great for reliable data lakes, but maybe not the best fit for this specific task.

upvoted 0 times

...

Emerson

1 year ago

MLflow could also be useful for tracking experiments and managing the machine learning lifecycle.

upvoted 0 times

...

Kerrie

1 year ago

I agree, Structured Streaming is the best choice for continuous processing.

upvoted 0 times

...

Janey

1 year ago

Hmm, I'm not sure about Structured Streaming. Isn't that more for stream processing? I feel like Delta Lake might be a better fit since it can handle batch processing as well.

upvoted 0 times

Dortha

1 year ago

Let's go with Delta Lake for the continuous processing pipeline.

upvoted 0 times

...

Annmarie

1 year ago

I'm not sure about Structured Streaming either, but Delta Lake seems like a versatile option.

upvoted 0 times

...

Lucina

1 year ago

I agree, Delta Lake can handle both batch and stream processing.

upvoted 0 times

...

Susana

1 year ago

I think Delta Lake would be a good choice for continuous processing.

upvoted 0 times

...

Loren

1 year ago

I think Structured Streaming is the right choice here. It's specifically designed for continuous, real-time data processing, which is exactly what the team needs for their machine learning pipeline.

upvoted 0 times