Data Science 101:- Data Reduction Techniques Using Python

3 min readOct 31, 2021

Data reduction using variance threshold, univariate feature selection, recursive feature elimination, PCA.

Data Reduction:

Since data mining is a technique that is used to handle huge amount of data. While working with huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses data reduction technique. It aims to increase the storage efficiency and reduce data storage and analysis costs.

Dimensionality Reduction:

This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after reconstruction from compressed data, original data can be retrieved, such reduction are called lossless reduction else it is called lossy reduction. The two effective methods of dimensionality reduction are:Wavelet transforms and PCA (Principal Component Analysis).

Principal Component Analysis:

Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. For a lot of machine learning applications it helps to be able to visualize your data. Visualizing 2 or 3 dimensional data is not that challenging. You can use PCA to reduce that 4 dimensional data into 2 or 3 dimensions so that you can plot and hopefully understand the data better.

Variance Threshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features. Our dataset has no zero variance feature so our data isn’t affected here.

Code:

GitHub - d2001patel/Data-Science: DATA SCIENCE PRACTICALS

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

LinkedIn

Data Science 101:- Data Reduction Techniques Using Python

GitHub - d2001patel/Data-Science: DATA SCIENCE PRACTICALS

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

Darshil Patel - Software Engineer Intern - Scanpoint Geomatics Ltd | LinkedIn

View Darshil Patel's profile on LinkedIn, the world's largest professional community. Darshil has 2 jobs listed on…

Written by Darshil Patel