Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Machine Learning Professional Topic 1 Question 20 Discussion

Actual exam question for Databricks's Databricks Machine Learning Professional exam
Question #: 20
Topic #: 1
[All Databricks Machine Learning Professional Questions]

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

Show Suggested Answer Hide Answer
Suggested Answer: C

Contribute your Thoughts:

Dolores
2 days ago
I think Delta Lake could work too, right?
upvoted 0 times
...
Ulysses
8 days ago
Definitely Structured Streaming for continuous processing!
upvoted 0 times
...
Sanda
14 days ago
AutoML seems off for this question; I don't think it handles the continuous pipeline aspect we're looking for.
upvoted 0 times
...
Paulina
19 days ago
Delta Lake sounds familiar, but I can't recall if it specifically supports continuous data preparation.
upvoted 0 times
...
Willodean
24 days ago
I think Structured Streaming is the right choice here. We practiced a similar question about streaming data in our last session.
upvoted 0 times
...
Frederick
1 month ago
I remember we discussed Spark UDFs in class, but I'm not sure if they are meant for continuous processing like this.
upvoted 0 times
...
Charolette
1 month ago
MLflow could be useful for managing the overall machine learning workflow, but I don't think it directly addresses the continuous data processing requirement. I'll need to think about how MLflow might fit into a broader solution for this problem.
upvoted 0 times
...
Chaya
1 month ago
Delta Lake could be a good fit for the data storage and management aspect of this problem, but I'm not sure if it provides the continuous processing capabilities on its own. I'll need to investigate how Delta Lake integrates with other tools like Structured Streaming.
upvoted 0 times
...
Shanda
1 month ago
AutoML seems like an interesting choice, but I'm not sure if it would provide the level of control and customization needed for a continuous data processing pipeline. I'll have to research the capabilities of AutoML more.
upvoted 0 times
...
Katie
1 month ago
Hmm, I'm not too familiar with Structured Streaming. I wonder if Spark UDFs could also be a viable solution for this problem. I'll need to look into the capabilities of both options.
upvoted 0 times
...
Viva
1 month ago
This looks like a question about building a continuous data processing pipeline. I think Structured Streaming would be a good option to consider since it's designed for this type of use case.
upvoted 0 times
...
Tayna
1 month ago
Ah, I remember this now. The vgextend command is used to add new physical volumes to an existing volume group. That's the answer I'm going with.
upvoted 0 times
...
Venita
1 month ago
Yeah, I feel like this situation was in one of our practice case studies. The key is to find the right balance between following regulations and the urgency of the council's request.
upvoted 0 times
...
Chery
1 month ago
I'm pretty sure the output should include both the numbers and their corresponding string representations, but I'm confused about how they'd be ordered.
upvoted 0 times
...
Ling
6 months ago
Hey, I heard MLflow is the new hotness for machine learning ops. Maybe they can just throw some emojis at the data and it'll magically get processed.
upvoted 0 times
Maryann
4 months ago
D: I've heard Delta Lake is good for managing large amounts of data efficiently.
upvoted 0 times
...
Marge
4 months ago
C: Spark UDFs might also be useful for data processing.
upvoted 0 times
...
Bernardine
4 months ago
B: Yeah, MLflow is great for managing the machine learning lifecycle.
upvoted 0 times
...
Gertude
5 months ago
A: I think MLflow could definitely help with that.
upvoted 0 times
...
...
Mari
6 months ago
Spark UDFs? That's just for extending Spark's functionality, not for continuous data processing. I'm with the others - Structured Streaming is the way to go.
upvoted 0 times
...
Marci
6 months ago
AutoML? Really? That's for automating the machine learning model development process, not data preprocessing. I'd say Structured Streaming is the clear winner here.
upvoted 0 times
Julie
4 months ago
Spark UDFs might work well for custom data processing functions within the pipeline.
upvoted 0 times
...
Helaine
5 months ago
I think MLflow could also be useful for tracking and managing the machine learning pipeline.
upvoted 0 times
...
Tijuana
5 months ago
I agree, Structured Streaming is the best choice for continuous processing.
upvoted 0 times
...
Sabina
5 months ago
Delta Lake is great for reliable data lakes, but maybe not the best fit for this specific task.
upvoted 0 times
...
Emerson
5 months ago
MLflow could also be useful for tracking experiments and managing the machine learning lifecycle.
upvoted 0 times
...
Kerrie
6 months ago
I agree, Structured Streaming is the best choice for continuous processing.
upvoted 0 times
...
...
Janey
7 months ago
Hmm, I'm not sure about Structured Streaming. Isn't that more for stream processing? I feel like Delta Lake might be a better fit since it can handle batch processing as well.
upvoted 0 times
Dortha
5 months ago
Let's go with Delta Lake for the continuous processing pipeline.
upvoted 0 times
...
Annmarie
5 months ago
I'm not sure about Structured Streaming either, but Delta Lake seems like a versatile option.
upvoted 0 times
...
Lucina
6 months ago
I agree, Delta Lake can handle both batch and stream processing.
upvoted 0 times
...
Susana
6 months ago
I think Delta Lake would be a good choice for continuous processing.
upvoted 0 times
...
...
Loren
7 months ago
I think Structured Streaming is the right choice here. It's specifically designed for continuous, real-time data processing, which is exactly what the team needs for their machine learning pipeline.
upvoted 0 times
Judy
6 months ago
I think MLflow could also be a good option for managing the machine learning pipeline.
upvoted 0 times
...
Geraldine
6 months ago
I agree, Structured Streaming is perfect for continuous processing.
upvoted 0 times
...
...
Beth
7 months ago
I personally prefer using MLflow for managing the machine learning pipeline.
upvoted 0 times
...
Starr
7 months ago
I agree with Denae, Structured Streaming is a good choice for processing data in batches.
upvoted 0 times
...
Denae
7 months ago
I think Structured Streaming can be used for continuous processing.
upvoted 0 times
...

Save Cancel