Deploying machine learning data pipelines and algorithms should not be a time-consuming or difficult task. MLeap allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine (MLeap documentation).

*MLeap* helps you to serialize and deserialize your model/pipeline. It exists for *Spark*, *Scikit-Learn* and *Tensorflow*.
You can use it with *Scikit-Learn* but the documentation is quite dry for the moment. This is why I decided to do this small tutorial.

## Prepare some data

I will use *Numpy* and *Pandas* to get some data.

Suppose you want to do a classification. You need a dataframe with a feature column and a label column.

```
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 1), columns=['a'])
df["y"] = (df['a'] > 0.5).astype(int)
df.head()
```

You might have a result like that:

a | y |
---|---|

0.61 | 1 |

0.26 | 0 |

0.20 | 0 |

A dataframe with variable *a* as a feature column and *y* as a label column for a classification.

## Serialize with MLeap and Scikit-Learn

I will use a logistic regression.

```
from mleap.sklearn.logistic import LogisticRegression
logistic_regression = LogisticRegression(fit_intercept=True)
logistic_regression.mlinit(input_features='a',
prediction_column='e_binary')
logistic_regression.fit(df[['a']], df[['y']])
logistic_regression.serialize_to_bundle(
"/dbfs/FileStore/tables/mleaptestmodel",
"model.json"
)
```

In *mleaptestmodel folder*, you have different files. They represent your model in a serialized format.

## Deserialize with MLeap and Scikit-Learn

To get back the model and use it for a prediction, you can do that:

```
from mleap.sklearn.logistic import LogisticRegression
node_name = "{}.node".format("model.json")
logistic_regression_tf = LogisticRegression()
model = logistic_regression_tf.deserialize_from_bundle(
"/dbfs/FileStore/tables/mleaptestmodel",
node_namelog
)
expected = model.predict(df[["a"]])
```

## Conclusion

*MLeap* seems to be a nice project but not very active. I’m not sure I would recommend it. But at least, you’ve got a getting started with logistic regression if you must deal with it.

For the moment, the documentation is quite dry. For instance, I found nothing about deserializing with *Scikit-Learn*.

The tests of the project help me to understand this part.

I plan to make a pull request to add some documentation.

Thank you for reading.