Applying AutoML(Part-2) with MLBox

Also, Check out our Article on:

Advantages

Disadvantages

Preprocessing:

mlbox.preprocessing.Reader(sep=None, header=0, to_hdf5=False, to_path=’save’, verbose=True)

It also offers multiple functionalities.

clean(path, drop_duplicate = False)
train_test_split(Lpath, target_name)

Drift Thresholding:

dft = Drift_thresholder()
df = dft.fit_transform(df)

It also offers multiple functionalities.

Encoding:

→ Missing values

mlbox.encoding.NA_encoder(numerical_strategy=’mean’, categorical_strategy=’’)

→ Categorical features

mlbox.encoding.Categorical_encoder(strategy=’label_encoding’, verbose=False) Encodes categorical features.

Model

→ Classification

mlbox.model.classification.Clf_feature_selector(strategy=’l1’, threshold=0.3)
mlbox.model.classification.Classifier(**params)
mlbox.model.classification.StackingClassifier()

→ Regression

mlbox.model.regression.Reg_feature_selector(strategy=’l1’, threshold=0.3)
mlbox.model.regression.Regressor(**params)

Optimization

mlbox.optimisation.Optimiser(scoring=None, n_folds=2,random_state=1,
to_path=’save’, verbose=True)

Prediction

mlbox.prediction.Predictor(to_path=’save’, verbose=True)

Python implementation of MLBox

1. Installing the Package:

!pip install mlbox

2. Importing the modules from MLBox

from mlbox.preprocessing import *from mlbox.optimisation import *from mlbox.prediction import *

3. Specifying the path and target name

paths = ["/contents/train.csv","/contents/test.csv"]
target_name = "Survived"

4. Preprocessing and splitting our data

rd = Reader(sep = ",")df = rd.train_test_split(paths, target_name)

5. Drift thresholding our data to remove any kind of bias

dft = Drift_thresholder()df = dft.fit_transform(df)

6. Initializing the optimizer

#Hyperparameter tuningopt = Optimiser(scoring = "accuracy", n_folds = 5)
space = {'est__strategy':{"search":"choice","space":["LightGBM"]},
'est__n_estimators':{"search":"choice","space":[150]},
'est__colsample_bytree':{"search":"uniform","space":
[0.8,0.95]},
'est__subsample':{"search":"uniform","space":[0.8,0.95]},
'est__max_depth':{"search":"choice","space":[5,6,7,8,9]},
'est__learning_rate':{"search":"choice","space":[0.07]}
}

7. Optimizing

params = opt.optimise(space, df,15)

8. Prediction

Also, Check out our Article on:

Visit us on https://www.insaid.co/

--

--

--

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Linear Regression Made Simple!

BLOOM FILTER IN ADS RECOMMENDATION

Logistics in the times of COVID

CXL Institute Digital Analytics Minidegree. Part (1/12)

How to overlay shapefile data on PyGMT Maps

Top Statistics Books for Machine Learning

Elastic search Aggregations

Summer Internship Experience- Part 3

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Visual Search in Deployment

Predict Churn with PySpark

Sentiment Analysis using Pyspark

Noun-phrase frequency analysis on Spark dataframes