true

Learn Advanced Statistics from the Best Tutors

• Affordable fees
• 1-1 or Group class
• Flexible Timings
• Verified Tutors

Search in

# Principal component analysis- A dimension reduction technique

Ashish R.
08/12/2016 0 0

In simple words, principal component analysis(PCA) is a method of extracting important variables (in form of components) from a large set of variables . It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. With fewer variables, visualization also becomes much more meaningful. This is why PCA is called dimension reduction technique. PCA is more useful when dealing with higher dimensional data and the variables have significant correlation among them.

Principal components analysis is one of the simplest of the multivariate methods. The objective of the analysis is to take p variables (x1,x2,x3.....xp) and find linear combination of these to produce transformed variabels (z1,z2,z3...zp) so that they are uncorelated in order of their importance and that describe the overall variation in the data set.

The lack of correlation means that the indices are measuring different “dimensions” of the data, and the ordering is such that var(z1)≥var(z2)≥var(z3)....var(zp), where var denotes the variance of . The Z indices are then the principal components. When doing principal components analysis, there is always the hope that the variances of most of the indices  will be as low as to be negligible. In that case, most of the variation in the full data set can be adequately described by the few Z variables with variances that are not negligible, and some degree of economy is then achieved. For this reason this is also called dimension reduction technique. Often the significant variances explained by the Z variables  have a dominant load factor associated with the original X variables and Z describe a specific degree of quantitative or qualitative nature of the X attributes. Hence such newly formed Z variables are called latent factor analysis.

Principal components analysis does not always work, in the sense that a large number of original variables are reduced to a small number of transformed variables. Indeed, if the original variables are uncorrelated, then the analysis achieves nothing. The best results are obtained when the original variables are very highly correlated, positively or negatively. If that is the case, then it is quite conceivable that for example 20 or more original variables can be adequately represented by two or three principal components. If this desirable state of affairs does occur, then the important principal components will be of some interest as measures of the underlying dimensions in the data. It will also be of value to know that there is a good deal of redundancy in the original variables, with most of them measuring similar things.

Where it is used?

A multi-dimensional hyper-space is often difficult to visualize. The main objectives of unsupervised learning methods are to reduce dimensionality, scoring all observations based on a composite index and clustering similar observations together based on multivariate attributes. Summarizing multivariate attributes by two or three variables that can be displayed graphically with minimal loss of information is useful in knowledge discovery. Because it is hard to visualize a multi-dimensional space, PCA is mainly used to reduce the dimensionality of d multivariate attributes into two or three dimensions.

PCA summarizes the variation in correlated multivariate attributes to a set of non-correlated components, each of which is a particular linear combination of the original variables. The extracted non-correlated components are called Principal Components (PC) and are estimated from the eigenvectors of the covariance matrix of the original variables. Therefore, the objective of PCA is to achieve parsimony and reduce dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information.

A few use cases where PCA is used:
Survey data: Any kind of market survey data which is collected in a Likert scale (0-5/0-10 etc.) can be used to derived principal components that can describe a specific sentiment of the customers/participants in the survey. The principal components with Eigen value >1 are the important ones to be considered.

Market mix model: In developing market mix model usually 52-104 weeks of sales and marketing  spend data along with many brand image variables that are measured in monthly/quarterly basis are used to derive the contribution of the marketing spends in generating revenue. In the overall ROI calculation a mix model is developed.  Realized sales/Revenue/Pipeline sales are modeled with the help of many spend related attributes and its various derived adstock values . In such scenario PCA is used to reduce the overall dimension of the data.

Brand image: To create brand image from many brand variables often PCA is used to calculate brand value index

NPA score calculation: In the calculation of NPA (Net promoter score) from customer survey data often PCA is used by considering the overall effect of all the considered variables

CSAT score calculation:  Similarly in CSAT score calculation PCA is used.

0 Dislike

## Other Lessons for You

Pivot Tables In Excel
i. Pivot Tables: A pivot table is essentially a dynamic summary report generated from a database. The database can reside in a worksheet (in the form of a table) or in an external data file. A pivot table...

Important points to keep in mind while trade
? Do not over trade – Do not put all your money in share market. ? Do Not put all your money in single share or single sector – Put or divide your money in multiple shares or sectors. This...

Variations Of Random Forest In R
One of the important steps in using analytics to generate insights is model fitting. Typical projects involve a lot of data cleaning so that high accuracy is achieved on application of the model. Competitions...

Microsoft Excel
Software developed and manufactured by Microsoft Corporation that allows users to organize, format, and calculate data with formulas using a spreadsheet system broken up by rows and columns. Microsoft...

Tableau Hands on
This video explains about Parameters, Grouping, Top N parameter, Playing with Filters, Donut charts, Bullet Graphs, Gantt Charts and Box and Whisker charts in Tableau. This video will be very beneficial...

### Looking for Advanced Statistics Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

X

### Looking for Advanced Statistics Classes?

The best tutors for Advanced Statistics Classes are on UrbanPro

• Select the best Tutor
• Book & Attend a Free Demo
• Pay and start Learning

### Learn Advanced Statistics with the Best Tutors

The best Tutors for Advanced Statistics Classes are on UrbanPro