Applying AutoML (Part-1) using Auto-Sklearn

Advantages

Disadvantages

Also, Check out our Article on:

Python Implementation

→ Installing library

!apt-get install swig -y!pip install Cython numpy
!pip install auto-sklearn

→ Importing library

import pandas as pd                                                 # Importing for panel data analysisfrom pandas_profiling import ProfileReport                          # Import Pandas Profiling (To generate Univariate Analysis)pd.set_option('display.max_columns', None)                          # Unfolding hidden features if the cardinality is highpd.set_option('display.max_colwidth', None)                         # Unfolding the max feature width for better clearitypd.set_option('display.max_rows', None)                             # Unfolding hidden data points if the cardinality is highpd.set_option('mode.chained_assignment', None)                      # Removing restriction over chained assignments operationspd.set_option('display.float_format', lambda x: '%.5f' % x)         # To suppress scientific notation over exponential values#-------------------------------------------------------------------------------------------------------------------------------import numpy as np                                                  # Importing package numpys (For Numerical Python)np.set_printoptions(precision=4)                                    # To display values only upto four decimal places.#-------------------------------------------------------------------------------------------------------------------------------import matplotlib.pyplot as plt                                     # Importing pyplot interface using matplotlibfrom matplotlib.pylab import rcParams                               # Backend used for rendering and GUI integrationimport seaborn as sns                                               # Importing seaborm library for interactive visualization# To get graph in Notbook.%matplotlib inline#-------------------------------------------------------------------------------------------------------------------------------import time                                                         # To get time for the execution#-------------------------------------------------------------------------------------------------------------------------------from smac.tae import StatusType                                     # To get the Status of the execution#-------------------------------------------------------------------------------------------------------------------------------from sklearn.model_selection import train_test_split                # To split the data in training and testing partfrom sklearn.metrics import accuracy_score, f1_score                # For Checking the accuracy and F1-Score of our modelimport autosklearn.classification                                   # For using AutoML

→ Reading data

data = pd.read_csv("https://raw.githubusercontent.com/insaid2018/Term-2/master/Data/credit_fraud.csv")data.head()

→ Splitting into Train and Test data

x = data.drop('Class', axis = 1)
y = np.array(data['Class'])
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2)

→ Fitting Auto-Sklearn

# configure auto-sklearnautoml = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=30,
per_run_time_limit=10)
automl.fit(x_train, y_train)

→ Evaluating Performance

# evaluatepred = automl.predict(x_test)test_acc = accuracy_score(y_test, pred)print("Test Accuracy score {0}".format(test_acc))test_f1 = f1_score(y_test, pred)print(f"Test F1-Score {test_f1}")

→ Checking reports of models built by Auto-Sklearn

print(automl.show_models())

→ Using Resampling Technique to fit Auto-Sklearn

automl_Hold = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit = 30,
resampling_strategy = 'holdout',
resampling_strategy_arguments = {'train_size':0.8})

→ Refit function

automl_cv.refit(x_train.copy(),y_train.copy())
# evaluatepred = automl_cv.predict(x_test)print("After Re-fit")print("-----------------------------")test_acc = accuracy_score(y_test, pred)print("Accuracy score {0}".format(test_acc))test_f1 = f1_score(y_test, pred)print(f"F1-Score {test_f1}")

Also, Check out our Article on:

Visit us on https://www.insaid.co/

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Build and Visualize a simple Decision Tree using Sklearn and Graphviz

Noise Injections with Automunge

Twitter Sentiment Analysis and Visualization using R

How to Analyze Airbnb Performance Data in the Right Way

Scale OR Fail

Introduce the VRES Health Point project.

Interested in Data Science?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Introduction to RAPIDS | NVIDIA

Did Stacking Improve My PySpark Churn Prediction Model?

Predict Churn with PySpark

Noun-phrase frequency analysis on Spark dataframes