SageMaker Pipeline vs MLFlow Details

Posted on June 19, 2021 · < 1 minute read

When is AWS Sagemaker not the best choice?

1 . Cost

It’d be easier to spin up mlflow on ecs spot instances and way cheaper than sagemaker.
There are examples on sagemaker working with mlflow: https://cosminsanda.com/posts/experiment-tracking-with-mlflow-inside-amazon-sagemaker/
I still think it’s better to docker the work and push it to ecs or eks.

2 . Flexibility

Sagemaker experiments requires all of the training jobs to be done using the sagemaker training api (meaning spending $$$ for them).
This could be an issue if you didn’t want to only use sagemaker supported algorithm.
sagemaker experiments is not very useful after I was told of its limitation.

3 . Cloud Vendor limitation

if cross cloud vendor tracking needed, then ML-flow is the natural choice.
for example tracking metrics from Azure or GCP

4 . Documentation & Community Support

limited documentation, or issue solutions => unless you pay for AWS premium support for help

MLFlow Automatic logging

https://www.mlflow.org/docs/latest/tracking.html#automatic-logging

The following libraries support autologging:

Scikit-learn
TensorFlow and Keras
Gluon
XGBoost
LightGBM
Statsmodels
Spark
Fastai
Pytorch

References:

Tags: ML Infrastructure