Hyper-Parameter Tuning!!

INSAID
8 min readSep 11, 2020

In machine learning, a Hyperparameter is a parameter whose value is used to control the learning process.
These hyperparameters need to be set before fitting it to the data in order to create more robust and better performing models. These parameters cannot be learned from training the data and are mandatory to assign before fitting any model.

But, Before moving on Let’s understand how is it a important task in any machine learning Project.

Importance of Hyper-Parameter Tuning!

  1. Hyperparameters are critical as they carry the responsibility for the outcome of any machine learning, deep learning model. Our goal is to find an optimal value for the hyperparameters that minimizes a loss function to give better results.
  2. It affects the convergence of any algorithm to a large extent.

Difference between Parameters and Hyperparameters

Photo by Jon Tyson on Unsplash

→ Model Parameters: These are the parameters in the model that must be determined using the training data set. These are the fitted parameters.
Model parameters differ for each experiment and depend on the type of data and task at hand.

Some examples of model parameters include:

  • The weights in an artificial neural network.
  • The support vectors in a support vector machine.
  • The coefficients in linear regression or logistic regression.
  • For NLP task: word frequency, sentence length, noun or verb distribution per sentence, the number of specific character n-grams per word, lexical diversity, etc.

→ Hyperparameters: These are adjustable parameters that must be tuned in order to obtain a model with optimal performance.

Some examples of model hyperparameters include:

  • The learning rate for training a neural network.
  • The C and sigma hyperparameters for support vector machines.
  • The ‘k’ in k-nearest neighbors.
  • Depth of tree in Decision trees
  • Learning rate of Gradient Descent Algorithm for neural networks.

Now that we are clear with the difference between model parameters and hyperparameters. Let’s take a look at ways to find the optimal hyperparameters values.

Photo by Ussama Azam on Unsplash

Hyperparameter Tuning/Optimization

The process that involves the search of the optimal values of hyperparameters for any machine learning algorithm is called hyperparameter tuning/optimization.

Building a baseline Model using RandomForest using the Titanic data.

  • Importing the necessary libraries and reading the data.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,f1_score,precision_score,recall_score
data = pd.read_csv("/content/train (1).csv")
  • Check for any missing values.
data.isnull().sum()

OUTPUT:

  • Filling missing values in age with mean and removing the columns which are not needed to build the model.
data.Age.fillna(29,inplace=True)data.drop(['PassengerId','Cabin','Ticket','Name'],axis = 1,inplace= True)data.dropna(inplace=True)
  • One-Hot encoding the data and converting the Pclass to object.
data = pd.get_dummies(data,drop_first=True)data.Pclass = data['Pclass'].astype(object)
  • Splitting the data in train and test.
x = data.drop('Survived',axis = 1)
y = data['Survived']
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.1,random_state = 123)
  • Creating a function that evaluates our model.
def evaluate_model(x_test,y_test,model):
pred = model.predict(x_test)
print("Accuracy is : {}".format(accuracy_score(y_test,pred)))
print("------------------------------------------")
print("F1-Score is : {}".format(f1_score(y_test,pred)))
print("------------------------------------------")
print("Precision is : {}".format(precision_score(y_test,pred)))
print("------------------------------------------")
print("Recall is : {}".format(recall_score(y_test,pred)))
  • Fitting our model and evaluating it.
rf = RandomForestClassifier().fit(x_train,y_train)evaluate_model(x_test,y_test,rf)

Our baseline model is performing pretty well let’s see if we can improve it’s performance by using different hyperparameter tuning methods.

Different Methods of Hyperparameter Tuning are:

→ GridSearch:

  • Grid search picks out a grid(parallel values) of hyperparameter values, evaluates every one of them, and returns the best.
  • This means that the parameters search is done in the entire grid of the selected data.
  • An important thing to remember while performing Grid Search is that the more parameters we have, the more time and space will be taken by the parameters to perform the search.
  • This is where the Curse of Dimensionality comes into the picture.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces(higher feature count)that do not occur in low-dimensional spaces(lower feature count).
This means the more dimensions we add, the more the search will increase in time complexity, ultimately making this strategy inconvenient.

  • Using this code will give you the default hyperparameters for our randomForest model
from pprint import pprintrf = RandomForestClassifier()# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(rf.get_params())

OUTPUT:

  • Now, we provide a grid of hyperparameters.
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 1500, num = 4)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 80, num = 4)]
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
grid_para = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
pprint(grid_para)#print our grid of hyperparameter values

OUTPUT:

  • Now, We fit the GridSearch model to find the set of optimal hyperparameter values.
    The model will try out 576 combinations of hyperparameters. This gives you an idea of how grid search increases the Time Complexity.
    2 of bootstrap
    4 of max_depth
    2 of max_features
    3 of min_samples_leaf
    3 of min_samples_split
    4 of n_estimators
    which gives a combination 2*4*2*3*3*4 = 576
from sklearn.model_selection import GridSearchCVgrid_search = GridSearchCV(estimator = rf, param_grid = grid_para, 
cv = 3, n_jobs = -1, verbose = 1)
# Fit the random search modelgrid_search.fit(x_train,y_train)
grid_search.best_params_ #outputs the set of best hyperparameter
values.

OUTPUT:

Notice, we are getting around 1728 fits, this is because we are cross validating our model 3 times.
So, 576 * 3 = 1728

  • Now, we Evaluate our model again.
evaluate_model(x_test,y_test,grid_search)

It’s clearly seen that each one of our evaluation metrics has increased drastically.
Now, Let’s do the same for the other method.

→ RandomSearch:

  • Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly.
  • Random Search is running multiple tasks simultaneously.
  • The drawback of random search is, it yields high variance during computing. Since the selection of parameters is completely random; and since no intelligence is used to sample these combinations, luck plays its part.
  • For example, Instead of trying to check all 100 samples, we can check 50 random parameters.
  • However, there is a trade-off to decreasing the time complexity. It is good at testing a wide range of values and normally it reaches a very good combination very fast, but the problem is that it doesn’t guarantee to give the best parameter combination.
  • Using this code will give you the default hyperparameters for our randomForest model
from pprint import pprintrf = RandomForestClassifier()# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(rf.get_params())

OUTPUT:

  • Now, we provide a grid of hyperparameters
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 1500, num = 4)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 80, num = 4)]
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
pprint(random_grid)#print our grid of hyperparameter values

OUTPUT:

  • Now, we fit the RandomSearch Model. This will take some time to execute. Depending on the size of the data.
    Note:
    → The most important arguments in RandomizedSearchCV are n_iter, it handles the number of different combinations of data to try.
    → cv which is the number of folds to use for cross-validation. Increasing cv folds reduces the chances of overfitting, but will increase the run time.
from sklearn.model_selection import RandomizedSearchCVrf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 60, cv = 3, verbose=1, random_state=42, n_jobs = -1)# Fit the random search modelrf_random.fit(x_train,y_train)
rf_random.best_params_ #outputs the set of best hyperparameter
values.

OUTPUT:

  • Now, We evaluate our model again

Our overall accuracy has increased, However, The increase in accuracy is due to drastic increase in Precision, but, Our recall score has reduced.

→ Manual Search:

  • Manual Search can be done on the basis of our judgment/experience.
  • We train the model based on the random values that we assigned manually, evaluate its accuracy, and start the process again.
  • This loop is repeated until a satisfactory accuracy is scored.

Conclusion

In machine Learning algorithms like ensemble techniques(Bagging and Boosting), SVM, and so on are often used for high performance.
However, not tuning these models with optimal hyperparameter values will prevent the model from reaching to its capacity.

Hyperparameter tuning is a very important task in the data science life cycle and one should know what are the hyperparameters taken by the algorithm in order to perform the above mentioned techniques.

Reference:

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

Visit us on https://www.insaid.co/

--

--

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!