# You may not know this about PCA

## In this article, we are going contrast the two-dimensionality reduction techniques that are Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA)

**By: ****Daksh Bhatnagar**

# Introduction

As time went on, the size of the data has grown **substantially**. Businesses nowadays want to consider **each and every aspect** before making a decision which translates to **higher dimensionality** in real-life data.

A higher dimensionality data is usually considered bad in data science and has been given the name of **Curse of Dimensionality**. The curse of Dimensionality refers to a set of problems that arise when working with high-dimensional data.

Some of the difficulties that come with **high dimensional data** manifest during **analyzing **or **visualizing **the data to identify patterns, and some manifest while training **machine learning models.**

The difficulties related to training machine learning models due to high dimensional data are referred to as the ‘Curse of Dimensionality’.

Luckily, for us, there exists a concept of **Dimensionality reduction** which works upon the very idea of reducing dimensions to the point the independent features **add value** to the predictive model but are not too much for the model to **be lost** and not make predictions or make terrible predictions.

# Principal Component Analysis

**Principal Component Analysis** is a way to reduce the number of variables while maintaining the **majority **of the important information. It transforms a number of variables that may be correlated into a smaller number of uncorrelated variables, known as **principal components**.

The main objective of PCA is to **simplify your model features** into fewer components to help visualize patterns in your data and to help your model run faster.

Using PCA reduces the chance of overfitting your model by eliminating features with **high correlation**.

What’s happening under the hood here is that the algorithm finds out the covariance matrix first and then Eigenvalues and eigenvectors are calculated. PCA projects the data onto an axis and selects those points that have high variance because the higher the spread of the data, the more we can explain using that data.

*Eigenvalues *and *eigenvectors *always** come in pairs**, so every eigenvector has an eigenvalue and their number is equal to the **number of dimensions** of the data.

There is something known as **Explained Variance** which is a concept that tells you how much variance is explained by which feature of the data. Ideally, you want to achieve **at least 90%** of the variance meaning you would want your data to explain 90% of the variation that’s going in your data.

Sometimes 1 feature could explain 98% variance however at other times, 1 feature could also explain 50% of the variance in the data which is exactly where the **explained variance plot** comes in handy.

In the chart above, we can see 1 feature is only explaining 53% of the data while 6 features (out of 30 features) explain 91% of the data.

When we plot out the **first four **eigenvectors after the linear transformation, here is what it looks like:-

In n-dimensional space, what’s happening is that an** imaginary axis** is chosen and then the data points are **projected **onto the axis and whichever axis has the **most amount **of variance is eventually selected, and then the other data points are also **transformed **by the multiplication of those top-n eigenvectors.

We also plotted the** first vector **before and after the linear transformation. The **light blue** is the vector before the transformation and the **dark blue** is the vector after the transformation.

The libraries like `scikit-learn`

and `numpy`

has made it very easy for us to transform the data and visualize it. You can use the code below to get the *eigenvectors *and *eigenvalues*.

`import numpy as np`

import pandas as pd

#Extracting input values

column_values = []

for i in range(len(inputs_df.columns)):

column_values.append(inputs_df.iloc[:,i].values)

#Making Covariance Matrix

covariance_matrix = np.cov(column_values)

#Getting the EigenVectors and the EigenValues

eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)

#selecting the top n eigen vectors

pc = eigen_vectors[0:6]

#transforming the other data points

transformed_df = np.dot(df.iloc[:,0:30],pc.T)

new_df = pd.DataFrame(transformed_df,

columns=['PC1','PC2','PC3', 'PC4', 'PC5', 'PC6'])

The above implementation was using the `numpy`

library. You can also use the code below which uses `scikit-learn`

for the same purpose.

`from sklearn import decomposition `

pca = decomposition.PCA(0.90)

X_transformed = pca.fit_transform(X)

PCA tries to put the **maximum possible information** in the first **component**, then the maximum remaining information in the second, and so on which means the last *eigenvectors *or the last component will have the **least variance**.

The final plot of the first two **Principal Components** would look something like this:-

# LINEAR DISCRIMINANT ANALYSIS

Linear Discriminant Analysis, or LDA for short, is a predictive modeling algorithm for multi-class classification. It is also used as a **dimensionality reduction technique**, providing a projection of a training dataset that best separates the examples by their assigned class.

Linear Discriminant Analysis is used to find a linear combination of features that characterizes or **separates two or more classes** of objects or events. It explicitly attempts to model the difference between the classes of data.

# Drawbacks of Linear Discriminant Analysis (LDA)

Although LDA is specifically used to solve **supervised classification** problems for two or more classes, it fails in some cases where the **Mean of the distributions** is shared. In this case, LDA fails to create a **new axis** that makes both classes linearly separable.

To overcome such problems, we use **non-linear Discriminant analysis** in machine learning.

Two criteria are used by LDA to create a new axis:

- Maximize the
**distance**between the means of the two classes. - Minimize the
**variation**within each class.

The cost function in LDA is the formula shown below. The **numerator **in the image below should be **maximum **since it would imply that the means are far apart and the classes can be separated well however the **denominator **should be **minimal **since the variance has to be the least for the model to make a good classification.

# DIFFERENCE BETWEEN PCA AND LDA

Both techniques focus on reducing dimensionality and use *eigenvalues *and *eigenvectors *under the hood however the **major difference** is that PCA doesn’t take into account the** class labels** while the purpose of LDA is to make a **decision boundary** between two classes (meaning it focuses on making the data linearly separable)

# CONCLUSION

- The difficulties related to training machine learning models due to high dimensional data are referred to as the
**‘Curse of Dimensionality’**. **Principal Component Analysis**is a way to reduce the number of variables while maintaining the**majority**of the important information.- Linear Discriminant Analysis is used to find a linear combination of features that characterizes or
**separates two or more classes**of objects or events. (linearly separable) - PCA doesn’t take into account the
**class labels**and LDA fails to create a**new axis**where the means of the distribution is shared. **Up Next,**I’ll be covering**more Machine Learning Algorithms**and how they compare and contrast with each other.- If you liked the
**tips**and they proved to be**helpful**to you, I’d appreciate it if you can give the article**a clap and follow**me for more**upcoming****Data Science, Machine Learning, and Artificial Intelligence**articles.

# Final Thoughts and Closing Comments

There are **some vital points** many **people fail to understand** while they pursue their **Data Science **or **AI journey**. If you are one of them and looking for a way to **counterbalance** these **cons**, check out the certification programs provided by **INSAID** on their website. If you liked this article, I recommend you go with the Global Certificate in Data Science & AI because this one will cover your foundations, machine learning algorithms, and deep neural networks (basic to advance).