Clustering using PyCaret!!!
- Clustering is a method of unsupervised learning and a common technique for statistical data analysis used in many fields.
It is mostly used as a data analysis technique for discovering interesting patterns in data, such as groups of customers based on their behavior. - However, there are different algorithms that expect the data to be passed in a certain way, this indeed takes a lot of time and energy to code lines and lines to get the desired output.
This is where PyCaret Comes to the rescue.
PyCaret is an open-source low-code machine learning library in Python that aims to reduce the time needed for experimenting with different machine learning models.
Let’s see how PyCaret helps build models faster
Getting Started with Clustering!!
If you are not familiar with PyCaret. I suggest you to first go through the below link before moving on from here.
Also, Check out our Article on:
Complete Guide to PyCaret
Classification using PyCaret
Regression using PyCaret
Anomaly Detection using PyCaret
Reading Data
You can also use datasets from outside.
from pycaret.datasets import get_dataget_data('index')
Scrolling down we can find datasets available for the Classification Modelling.
import pycaretfrom pycaret.clustering import *data = get_data('pokemon')
We will use the pokemon data
Setting up the PyCaret environment
Before moving on with any kind of experimentation using PyCaret we need to set up the environment.
It is a mandatory step that should be done before any machine learning experiment.
clust = setup(data = data)
As you know PyCaret helps in model deployment too, so all the experiment done is saved in a pipeline and this pipeline can be deployed into production with ease.
After this press enter and you will get results as shown below.
Creating Models
Creating a model in PyCaret is one of the simplest task.
The “create_model” function takes in just the model ID as string and performs the task.
create_model('model_ID')
Model ID for Clustering Models.
+-----------------------+----------------+
| Cluster PCA Plot (2d) | ‘cluster’ |
+-----------------------+----------------+
| Cluster TSnE (3d) | ‘tsne’ |
| Elbow Plot | ‘elbow’ |
| Silhouette Plot | ‘silhouette’ |
| Distance Plot | ‘distance’ |
| Distribution Plot | ‘distribution’ |
+-----------------------+----------------+
Creating Kmeans Model:
Creating hclust model:
Moving on with the kmeans model.
Assign Models
It is used to assign the cluster labels to the data set.
Plot a Model
It helps in checking the performance of a model with different graphs in one line of code.
model = create_model('Model_name')plot_model(model)
By default, Cluster plots are done.
Cluster Plot:
Elbow plot:
Silhouette Plot:
Distribution Plot:
Distance Plot:
+-----------------------+----------------+
| Cluster PCA Plot (2d) | ‘cluster’ |
+-----------------------+----------------+
| Cluster TSnE (3d) | ‘tsne’ |
| Elbow Plot | ‘elbow’ |
| Silhouette Plot | ‘silhouette’ |
| Distance Plot | ‘distance’ |
| Distribution Plot | ‘distribution’ |
+-----------------------+----------------+
Save Models
Saving a trained model in PyCaret is as simple as writing save_model. The function takes a trained model object and saves the entire transformation pipeline and trained model object as a transferable binary pickle file for later use.
As you can see PyCaret literally helps build an end to end clustering model with few lines of code
Also, Check out our Article on:
Complete Guide to PyCaret
Classification using PyCaret
Regression using PyCaret
Anomaly Detection using PyCaret
Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.
Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.