An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.
The initial code is:

def in_spanish_inner(df: pd.Series) -> pd.Series:
model = get_translation_model(target_lang='es')
return df.apply(model)
in_spanish = sf.pandas_udf(in_spanish_inner, StringType())
How can the MLOps engineer change this code to reduce how many times the language model is loaded?
The provided code defines a Pandas UDF of type Series-to-Series, where a new instance of the language model is created on each call, which happens per batch. This is inefficient and results in significant overhead due to repeated model initialization.
To reduce the frequency of model loading, the engineer should convert the UDF to an iterator-based Pandas UDF (Iterator[pd.Series] -> Iterator[pd.Series]). This allows the model to be loaded once per executor and reused across multiple batches, rather than once per call.
From the official Databricks documentation:
''Iterator of Series to Iterator of Series UDFs are useful when the UDF initialization is expensive... For example, loading a ML model once per executor rather than once per row/batch.''
--- Databricks Official Docs: Pandas UDFs
Correct implementation looks like:
python
CopyEdit
@pandas_udf('string')
def translate_udf(batch_iter: Iterator[pd.Series]) -> Iterator[pd.Series]:
model = get_translation_model(target_lang='es')
for batch in batch_iter:
yield batch.apply(model)
This refactor ensures the get_translation_model() is invoked once per executor process, not per batch, significantly improving pipeline performance.
Nakita
9 hours agoAnnice
6 days agoJillian
11 days agoMariann
16 days agoBonita
21 days agoDudley
26 days agoAbraham
1 month agoTarra
1 month agoRikki
1 month agoAlpha
2 months agoGilma
2 months agoCyndy
2 months agoEdward
2 months agoGertude
2 months agoDerick
2 months agoRolf
3 months agoMagda
3 months agoEffie
3 months agoMaryln
3 months ago