UrbanPro
true

Learn Advanced Statistics from the Best Tutors

  • Affordable fees
  • 1-1 or Group class
  • Flexible Timings
  • Verified Tutors

Search in

Principal component analysis- A dimension reduction technique

Ashish R.
08/12/2016 0 0

In simple words, principal component analysis(PCA) is a method of extracting important variables (in form of components) from a large set of variables . It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. With fewer variables, visualization also becomes much more meaningful. This is why PCA is called dimension reduction technique. PCA is more useful when dealing with higher dimensional data and the variables have significant correlation among them.

Principal components analysis is one of the simplest of the multivariate methods. The objective of the analysis is to take p variables (x1,x2,x3.....xp) and find linear combination of these to produce transformed variabels (z1,z2,z3...zp) so that they are uncorelated in order of their importance and that describe the overall variation in the data set. 

The lack of correlation means that the indices are measuring different “dimensions” of the data, and the ordering is such that var(z1)≥var(z2)≥var(z3)....var(zp), where var denotes the variance of . The Z indices are then the principal components. When doing principal components analysis, there is always the hope that the variances of most of the indices  will be as low as to be negligible. In that case, most of the variation in the full data set can be adequately described by the few Z variables with variances that are not negligible, and some degree of economy is then achieved. For this reason this is also called dimension reduction technique. Often the significant variances explained by the Z variables  have a dominant load factor associated with the original X variables and Z describe a specific degree of quantitative or qualitative nature of the X attributes. Hence such newly formed Z variables are called latent factor analysis.

Principal components analysis does not always work, in the sense that a large number of original variables are reduced to a small number of transformed variables. Indeed, if the original variables are uncorrelated, then the analysis achieves nothing. The best results are obtained when the original variables are very highly correlated, positively or negatively. If that is the case, then it is quite conceivable that for example 20 or more original variables can be adequately represented by two or three principal components. If this desirable state of affairs does occur, then the important principal components will be of some interest as measures of the underlying dimensions in the data. It will also be of value to know that there is a good deal of redundancy in the original variables, with most of them measuring similar things.

Where it is used?

A multi-dimensional hyper-space is often difficult to visualize. The main objectives of unsupervised learning methods are to reduce dimensionality, scoring all observations based on a composite index and clustering similar observations together based on multivariate attributes. Summarizing multivariate attributes by two or three variables that can be displayed graphically with minimal loss of information is useful in knowledge discovery. Because it is hard to visualize a multi-dimensional space, PCA is mainly used to reduce the dimensionality of d multivariate attributes into two or three dimensions.

PCA summarizes the variation in correlated multivariate attributes to a set of non-correlated components, each of which is a particular linear combination of the original variables. The extracted non-correlated components are called Principal Components (PC) and are estimated from the eigenvectors of the covariance matrix of the original variables. Therefore, the objective of PCA is to achieve parsimony and reduce dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information. 

A few use cases where PCA is used:
Survey data: Any kind of market survey data which is collected in a Likert scale (0-5/0-10 etc.) can be used to derived principal components that can describe a specific sentiment of the customers/participants in the survey. The principal components with Eigen value >1 are the important ones to be considered.

Market mix model: In developing market mix model usually 52-104 weeks of sales and marketing  spend data along with many brand image variables that are measured in monthly/quarterly basis are used to derive the contribution of the marketing spends in generating revenue. In the overall ROI calculation a mix model is developed.  Realized sales/Revenue/Pipeline sales are modeled with the help of many spend related attributes and its various derived adstock values . In such scenario PCA is used to reduce the overall dimension of the data.  

Brand image: To create brand image from many brand variables often PCA is used to calculate brand value index

NPA score calculation: In the calculation of NPA (Net promoter score) from customer survey data often PCA is used by considering the overall effect of all the considered variables

CSAT score calculation:  Similarly in CSAT score calculation PCA is used.

 

0 Dislike
Follow 0

Please Enter a comment

Submit

Other Lessons for You

What is a VBA Module?
VBA code is stored and typed in the VBA Editor in what are called modules As stated on the VBA Editor page, a collection of modules is what is called a VBA project Every major Microsoft Office product...

Regularisation in Machine Learning
Regularization In Machine Learning, Regularization is the concept of shrinking or regularizing the coefficients towards zero. It helps the model to prevent overfitting. Overfitting in Machine Learning...

Chart
A chart is a set of coordinates When you make a chart you start with an empty, two-dimensional space, a vertical dimension (y) and a horizontal dimension (x) . You also have a data source. Your job is...

SQL (Structured Query Language)
SQL (Structured Query Language) is a standardized programming language used for managing relational databases and performing various operations on the data in them. Initially created in the 1970s, SQL...

What is Microsoft Access?
Microsoft Access has been around for some time, yet people often still ask me what is Microsoft Access and what does it do? Microsoft Access is a part of the Microsoft Office Suite. It does not come with...

Looking for Advanced Statistics Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you
X

Looking for Advanced Statistics Classes?

The best tutors for Advanced Statistics Classes are on UrbanPro

  • Select the best Tutor
  • Book & Attend a Free Demo
  • Pay and start Learning

Learn Advanced Statistics with the Best Tutors

The best Tutors for Advanced Statistics Classes are on UrbanPro

This website uses cookies

We use cookies to improve user experience. Choose what cookies you allow us to use. You can read more about our Cookie Policy in our Privacy Policy

Accept All
Decline All

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 55 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 7.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more