New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Machine Learning Professional Exam - Topic 1 Question 20 Discussion

Actual exam question for Databricks's Databricks Machine Learning Professional exam
Question #: 20
Topic #: 1
[All Databricks Machine Learning Professional Questions]

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

Show Suggested Answer Hide Answer
Suggested Answer: C

Contribute your Thoughts:

0/2000 characters
Virgina
3 months ago
Totally with you on Structured Streaming! It's the best option here.
upvoted 0 times
...
Ashlee
3 months ago
Wait, can AutoML even handle continuous data prep? Sounds off.
upvoted 0 times
...
Omega
3 months ago
Spark UDFs are more for batch processing, not continuous.
upvoted 0 times
...
Dolores
4 months ago
I think Delta Lake could work too, right?
upvoted 0 times
...
Ulysses
4 months ago
Definitely Structured Streaming for continuous processing!
upvoted 0 times
...
Sanda
4 months ago
AutoML seems off for this question; I don't think it handles the continuous pipeline aspect we're looking for.
upvoted 0 times
...
Paulina
4 months ago
Delta Lake sounds familiar, but I can't recall if it specifically supports continuous data preparation.
upvoted 0 times
...
Willodean
4 months ago
I think Structured Streaming is the right choice here. We practiced a similar question about streaming data in our last session.
upvoted 0 times
...
Frederick
5 months ago
I remember we discussed Spark UDFs in class, but I'm not sure if they are meant for continuous processing like this.
upvoted 0 times
...
Charolette
5 months ago
MLflow could be useful for managing the overall machine learning workflow, but I don't think it directly addresses the continuous data processing requirement. I'll need to think about how MLflow might fit into a broader solution for this problem.
upvoted 0 times
...
Chaya
5 months ago
Delta Lake could be a good fit for the data storage and management aspect of this problem, but I'm not sure if it provides the continuous processing capabilities on its own. I'll need to investigate how Delta Lake integrates with other tools like Structured Streaming.
upvoted 0 times
...
Shanda
5 months ago
AutoML seems like an interesting choice, but I'm not sure if it would provide the level of control and customization needed for a continuous data processing pipeline. I'll have to research the capabilities of AutoML more.
upvoted 0 times
...
Katie
5 months ago
Hmm, I'm not too familiar with Structured Streaming. I wonder if Spark UDFs could also be a viable solution for this problem. I'll need to look into the capabilities of both options.
upvoted 0 times
...
Viva
5 months ago
This looks like a question about building a continuous data processing pipeline. I think Structured Streaming would be a good option to consider since it's designed for this type of use case.
upvoted 0 times
...
Tayna
5 months ago
Ah, I remember this now. The vgextend command is used to add new physical volumes to an existing volume group. That's the answer I'm going with.
upvoted 0 times
...
Venita
5 months ago
Yeah, I feel like this situation was in one of our practice case studies. The key is to find the right balance between following regulations and the urgency of the council's request.
upvoted 0 times
...
Chery
5 months ago
I'm pretty sure the output should include both the numbers and their corresponding string representations, but I'm confused about how they'd be ordered.
upvoted 0 times
...
Ling
9 months ago
Hey, I heard MLflow is the new hotness for machine learning ops. Maybe they can just throw some emojis at the data and it'll magically get processed.
upvoted 0 times
Maryann
8 months ago
D: I've heard Delta Lake is good for managing large amounts of data efficiently.
upvoted 0 times
...
Marge
8 months ago
C: Spark UDFs might also be useful for data processing.
upvoted 0 times
...
Bernardine
8 months ago
B: Yeah, MLflow is great for managing the machine learning lifecycle.
upvoted 0 times
...
Gertude
8 months ago
A: I think MLflow could definitely help with that.
upvoted 0 times
...
...
Mari
9 months ago
Spark UDFs? That's just for extending Spark's functionality, not for continuous data processing. I'm with the others - Structured Streaming is the way to go.
upvoted 0 times
...
Marci
10 months ago
AutoML? Really? That's for automating the machine learning model development process, not data preprocessing. I'd say Structured Streaming is the clear winner here.
upvoted 0 times
Julie
8 months ago
Spark UDFs might work well for custom data processing functions within the pipeline.
upvoted 0 times
...
Helaine
8 months ago
I think MLflow could also be useful for tracking and managing the machine learning pipeline.
upvoted 0 times
...
Tijuana
8 months ago
I agree, Structured Streaming is the best choice for continuous processing.
upvoted 0 times
...
Sabina
9 months ago
Delta Lake is great for reliable data lakes, but maybe not the best fit for this specific task.
upvoted 0 times
...
Emerson
9 months ago
MLflow could also be useful for tracking experiments and managing the machine learning lifecycle.
upvoted 0 times
...
Kerrie
9 months ago
I agree, Structured Streaming is the best choice for continuous processing.
upvoted 0 times
...
...
Janey
10 months ago
Hmm, I'm not sure about Structured Streaming. Isn't that more for stream processing? I feel like Delta Lake might be a better fit since it can handle batch processing as well.
upvoted 0 times
Dortha
9 months ago
Let's go with Delta Lake for the continuous processing pipeline.
upvoted 0 times
...
Annmarie
9 months ago
I'm not sure about Structured Streaming either, but Delta Lake seems like a versatile option.
upvoted 0 times
...
Lucina
9 months ago
I agree, Delta Lake can handle both batch and stream processing.
upvoted 0 times
...
Susana
10 months ago
I think Delta Lake would be a good choice for continuous processing.
upvoted 0 times
...
...
Loren
10 months ago
I think Structured Streaming is the right choice here. It's specifically designed for continuous, real-time data processing, which is exactly what the team needs for their machine learning pipeline.
upvoted 0 times
Judy
10 months ago
I think MLflow could also be a good option for managing the machine learning pipeline.
upvoted 0 times
...
Geraldine
10 months ago
I agree, Structured Streaming is perfect for continuous processing.
upvoted 0 times
...
...
Beth
10 months ago
I personally prefer using MLflow for managing the machine learning pipeline.
upvoted 0 times
...
Starr
11 months ago
I agree with Denae, Structured Streaming is a good choice for processing data in batches.
upvoted 0 times
...
Denae
11 months ago
I think Structured Streaming can be used for continuous processing.
upvoted 0 times
...

Save Cancel