Ask a Question

Feed

All

Lesson Posted on 19/12/2017 IT Courses/Data Science IT Courses/Machine Learning IT Courses/Advanced Statistics

Tuning Parameters Of Decision Tree Models

Ashish R.

SAS certified analytics professionals, more than 11 years of industrial and 12 years of teaching experience....

Implementations of the decision tree algorithm usually provide a collection of parameters for tuning how the tree is built. The defaults in Rattle often provide a basically good tree. They are certainly a very good starting point, and indeed may be a satisfactory end point! However, tuning will be necessary... read more

Implementations of the decision tree algorithm usually provide a collection of parameters for tuning how the tree is built. The defaults in Rattle often provide a basically good tree. They are certainly a very good starting point, and indeed may be a satisfactory end point! However, tuning will be necessary where, for example, the target variable has very few observations of the particular class of interest.(Why?)

The following tuning parameters are quite useful to know and use in developing many tree based classifications. For more details about all the tuning parameters to know please type “?rpart.control” in the console of Rstudio. J

- Minsplit : It is used for Minimum number of observations for a node to be considered for a split.
- Minbucket: It is the minimum number of observations in any terminal node.

If only one of “minbucket” or “minsplit” is specified, the code either set minsplit = minbucket*3 or minbucket = minsplit/3, as appropriate. The two options “minbucket” and “minsplit” are closely related. In rpart if either is not specified then by default the other one is calculated by using the above formula. Usually a node will always have at least “minbucket” entities, and it will be considered for splitting if it has at least “minsplit” entities and on splitting, each of its children have at least “minbucket” entities. Simply saying if we specify “minbucket”, it would automatically take 3 time of minbucket as minsplit parameter by default.

- Maxdepth: Set the maximum depth of any node of the final tree, with the root node counted as depth 0. Input value greater than 30 “rpart” will give nonsense results on 32-bit machines.
- Priors: Sometimes the proportions of classes in a training set do not reflect their true proportions in the population. We can inform the population proportions to Rattle package, and the resulting model will reflect these.

The so-called priors can also be used to “boost” a particularly important class, by giving it a higher prior probability, although this might best be done through the Loss Matrix.

In Rattle the priors are expressed as a list of numbers that sum up to 1. The list must be of the same length as the number of unique classes in the target variable. An example for binary classification is 0.5, 0.5.

The default priors are set to be the class proportions as found in the training dataset.

Using rpart directly we specify prior within an option called parm:

Complexity parameter (cp): This parameter controls how splits are carried out (i.e., the number of branches in the tree). The value should be under 1, and the smaller the value, the more branches in the final tree. A value of "Auto" or omitting a value will result in the "best" complexity parameter being selected based on cross-validation. Usually the default cp value is considered to be 0.01.

The complexity parameter (cp) is used to control the size of the decision tree and to select the optimal tree size. If the cost of adding another variable to the decision tree from the current node is above the value of cp, then tree building does not continue. We could also say that tree construction does not continue unless it would decrease the overall lack of fit by a factor of cp. Setting this to zero will build a tree to its maximum depth (and perhaps will build a very, very, large tree). This is useful if we want to look at the values for CP for various tree sizes. This information will be in the console window. We will look for the number of splits where the sum of the xerror (cross validation error, relative to the root node error) and xstd(variance of the relative errors ) is minimum. This is usually early in the list.

The option to use cp parameter in R is as follows:

# control = rpart.control(minsplit = 50)

## example control = rpart.control(cp = , minsplit =))

read less

Like 0

Comments Lesson Posted on 10/08/2017 IT Courses/Advanced Statistics

R. K. Shukla

Ripunjai is highly enthusiastic analytics professional with more than nine year experience in diversified...

In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or... read more

Like 0

Comments Answered on 06/06/2017 IT Courses/Advanced Statistics

Daniel L

Tutor

ISI

Like 1

Answers 3 Comments Looking for Advanced Statistics Training

Find best Advanced Statistics Training in your locality on UrbanPro.

Lesson Posted on 06/01/2017 IT Courses/Data Science IT Courses/Advanced Statistics Financial Planning/Business Analytics Training

Beware Of Trainers Of Data Science.

Data Labs Training and Consulting Services

We provide online /Classroom training. Our team qualified from National Institute of Technology(NIT)...

Most of the trainers in the market are teaching DATA SCIENCE as 1) Some software tools like R/Python/SAS/Hadoop etc 2)They are spending less amount of time on Mathematics and Statistics(Mostly 10 hrs on mathematics/statistics.Most of the trainers are teaching few algorithms and calling that as DATA... read more

Most of the trainers in the market are teaching DATA SCIENCE as

1) Some software tools like R/Python/SAS/Hadoop etc

2)They are spending less amount of time on Mathematics and Statistics(Mostly 10 hrs on mathematics/statistics.Most of the trainers are teaching few algorithms and calling that as DATA SCIENCE without explaining the background mathematics)

If you know only above two things,you never become a data scientist.You may get the job as there is a lot of demand in the data science job market.But once you get into the company,you can not do the job.

How to evaluate the trainer and its syllabus?

1) Ask trainer, what amount of mathematics/statistics is he going to teach?

If you get the answer as *80%-90% mathematics and statistics* using *paper and pen method*,then you can *choose that trainer*.Once you know the mathematics and statistics,learning any software will not take more than a week.So do not ask the trainer what software tools he is going to teach and ask how much is the mathematics and statistics he is going to teaching

If any trainer says *mathematics/statistics are not required*,only learn some software then you can conclude that particular *training is not good.*

2)Ask him how much of Probability,matrices,calculus and co-ordinate geometry he is going to teach

If he says around *30-40 hours apart from Inferenatial Statistics/Predictiona Analysis/Machine learning*,then you can* join with that particular trainer*.

Once* you are comfortable with Probability,Matrices,calculus and co-ordinate geometry *then *learning machine learning/prediction analytics* etc are *cake walk*.

if above are meeting ,the learn any tool like R/python etc.

read less

Like 2

Comments Lesson Posted on 08/12/2016 IT Courses/Advanced Statistics IT Courses/Data Science Financial Planning/Business Analytics Training

Principal component analysis- A dimension reduction technique

Ashish R.

SAS certified analytics professionals, more than 11 years of industrial and 12 years of teaching experience....

In simple words, principal component analysis(PCA) is a method of extracting important variables (in form of components) from a large set of variables . It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. With fewer variables,... read more

In simple words, principal component analysis(PCA) is a method of extracting important variables (in form of components) from a large set of variables . It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. With fewer variables, visualization also becomes much more meaningful. This is why PCA is called **dimension reduction technique.** PCA is more useful when dealing with higher dimensional data and the variables have significant correlation among them.

Principal components analysis is one of the simplest of the multivariate methods. The objective of the analysis is to take p variables (x1,x2,x3.....xp) and find linear combination of these to produce transformed variabels (z1,z2,z3...zp) so that they are uncorelated in order of their importance and that describe the overall variation in the data set.

The lack of correlation means that the indices are measuring different “dimensions” of the data, and the ordering is such that var(z1)≥var(z2)≥var(z3)....var(zp), where var denotes the variance of . The Z indices are then the principal components. When doing principal components analysis, there is always the hope that the variances of most of the indices will be as low as to be negligible. In that case, most of the variation in the full data set can be adequately described by the few Z variables with variances that are not negligible, and some degree of economy is then achieved. For this reason this is also called dimension reduction technique. Often the significant variances explained by the Z variables have a dominant load factor associated with the original X variables and Z describe a specific degree of quantitative or qualitative nature of the X attributes. Hence such newly formed Z variables are called **latent factor analysis**.

Principal components analysis does not always work, in the sense that a large number of original variables are reduced to a small number of transformed variables. Indeed, if the original variables are uncorrelated, then the analysis achieves nothing. The best results are obtained when the original variables are very highly correlated, positively or negatively. If that is the case, then it is quite conceivable that for example 20 or more original variables can be adequately represented by two or three principal components. If this desirable state of affairs does occur, then the important principal components will be of some interest as measures of the underlying dimensions in the data. It will also be of value to know that there is a good deal of redundancy in the original variables, with most of them measuring similar things.

**Where it is used?**

A multi-dimensional hyper-space is often difficult to visualize. The main objectives of unsupervised learning methods are to reduce dimensionality, scoring all observations based on a composite index and clustering similar observations together based on multivariate attributes. Summarizing multivariate attributes by two or three variables that can be displayed graphically with minimal loss of information is useful in knowledge discovery. Because it is hard to visualize a multi-dimensional space, PCA is mainly used to reduce the dimensionality of d multivariate attributes into two or three dimensions.

PCA summarizes the variation in correlated multivariate attributes to a set of non-correlated components, each of which is a particular linear combination of the original variables. The extracted non-correlated components are called Principal Components (PC) and are estimated from the eigenvectors of the covariance matrix of the original variables. Therefore, the objective of PCA is to achieve parsimony and reduce dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information.

**A few use cases where PCA is used:**Survey data: Any kind of market survey data which is collected in a Likert scale (0-5/0-10 etc.) can be used to derived principal components that can describe a specific sentiment of the customers/participants in the survey. The principal components with Eigen value >1 are the important ones to be considered.

Market mix model: In developing market mix model usually 52-104 weeks of sales and marketing spend data along with many brand image variables that are measured in monthly/quarterly basis are used to derive the contribution of the marketing spends in generating revenue. In the overall ROI calculation a mix model is developed. Realized sales/Revenue/Pipeline sales are modeled with the help of many spend related attributes and its various derived adstock values . In such scenario PCA is used to reduce the overall dimension of the data.

Brand image: To create brand image from many brand variables often PCA is used to calculate brand value index

NPA score calculation: In the calculation of NPA (Net promoter score) from customer survey data often PCA is used by considering the overall effect of all the considered variables

CSAT score calculation: Similarly in CSAT score calculation PCA is used.

read less

Like 0

Comments Lesson Posted on 08/12/2016 IT Courses/Advanced Statistics IT Courses/Data Science Functional Training/Business Analysis Training

What is Logistic Regression Model ?

Ashish R.

SAS certified analytics professionals, more than 11 years of industrial and 12 years of teaching experience....

Logistic regression is a form of regression which is used when the dependent is a dichotomy (yes or no) and the independents of any type (either continuous or binary). Logistic regression can be used to predict a dependent variable on the basis of continuous and/ or categorical independents and to determine... read more

Logistic regression is a form of regression which is used when the dependent is a dichotomy (yes or no) and the independents of any type (either continuous or binary).

Logistic regression can be used to predict a dependent variable on the basis of continuous and/ or categorical independents and to determine the percentage variance in the dependent variable explained by the independent variables. The impact of predictor (independent) variables is usually explained in terms of odds ratios. This is one of the most prefered linear classifier model that is used in solving many problems in our client services industry in India.

**One important point to say is that the choice of any model selection in solving a practical use case only depends on the type of the dependent variable and often the underlying probability distribution. It does not depends on the types of the independent/ predictor variables. **

**Where it is used:**

Logistic model is used in various instances. Among them the following are very common where it is used quite often:

- Customer attrition model/Churn model: To predict the customer likely to attrite from a bank/financial institution/telecom services
- Next purchase propensity model: To predict whether a customer is likely to purchase if we target by a promotion/campaign
- Cross sell model: whether a customer is likely to buy a new product or a service across the all possible products/services
- Upsell model: Whether a customer likely to buy more in the next quarter than his existing purchase pattern if we target them with right promotional offer
- Customer conversion model: Whether the prepaid customer for a telecom giant will convert to a postpaid customer if we target them suitably
- Insurance model: Whether a customer is likelihood to be hospitalized in the coming quarters.

This model is one of the predictive models that used across many industries like aviation, bank and finance, retail, pharma, CPG, telecom, shipping line, online retail, E commerce, FMCG etc.

**Aim to build logit model:**

The basic aim to build such model is to investigate the probability of a customer/patience is likely to respond to a defined event. Event is the fact that we are trying to predict. Based on the predicted probability right business steps should be taken to optimize the margin of the business. To study the drivers responsible for an event is also another parallel objective in building such models. By building logit model

- The right targeting list could be generated to maximize the response rate
- Right set of customers could be targeted to cross sell or upsell any product/service
- Right business decision could be taken in advance the value the customers when they are active in the system

Like 0

Comments Looking for Advanced Statistics Training

Find best Advanced Statistics Training in your locality on UrbanPro.

Lesson Posted on 18/11/2016 IT Courses/Data Science IT Courses/Advanced Statistics IT Courses/Data Analysis

Basics of K means classification- An unsupervised learning algorithm

Ashish R.

K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set with n objects through a certain number of K clusters. K-means stands for K number of clusters to form in your training... read more

K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set with n objects through a certain number of K clusters. K-means stands for K number of clusters to form in your training sample. The idea behind doing the clustering is that elements that are very much similar with respect to each other with respect to the considered parameters/attributes should go to the same cluster. As a result we could expect that variability within cluster should be very low (as much minimum as possible) and variability a across clusters should be maximum. This is to remember when we say that the elements within a cluster should be very much similar then we articulate the similarity with respect to the considered variables that are used in the execution of the classification algorithm. The elements within a cluster/segment might differ with respect to some other parameters that are not considered in the execution of the algorithm. For example, if we have a 1000 stores from retail chain to cluster them into multiple groups based on the parameters like sales volume, size, # of SKUs available, # of labors deployed then within each cluster the stores they might vary with respect of some parameters like store managers experience, made os payment accepts inside the store etc.

Now it is to know, where it is used or what kind of problems are solved using this technique:

Telecom domain: Segment the customers based on network usage data across various services (Youtube, Google, Social media, Netflix, VPN work etc.). the idea is to cluster the customers so that the right segment of the customers could be trageted with right stratey for product/services upsell and cross sell

Reatil Banking: Segment the credit card applicants based on age, income, occupation, gender and other demographic profile to determine the credit limit.

Insurance domain: Segment the customer base based on age, lifestyle, income, demographic features to determine the insurance premium.

read less

Like 0

Comments Lesson Posted on 29/10/2016 IT Courses/Advanced Statistics IT Courses/Big Data Functional Training/Business Analysis Training

Approach for Mastering Data Science

Data Labs Training and Consulting Services

We provide online /Classroom training. Our team qualified from National Institute of Technology(NIT)...

Few tips to Master Data Science 1)Do not start your learning with some software like R/Python/SAS etc 2)Start with very basics like 10th class Matrices/Coordinate Geometry/ 3) Understand little bit more about a)matrices like transpose/Inverse/Symmetric/Idempotent matrices/SVD/Eigen values and Vectors/Orthogonal... read more

Few tips to Master Data Science

1)Do not start your learning with some software like R/Python/SAS etc

2)Start with very basics like 10th class Matrices/Coordinate Geometry/

3) Understand little bit more about

a)matrices like transpose/Inverse/Symmetric/Idempotent matrices/SVD/Eigen values and Vectors/Orthogonal Matrices etc

b)Vectors like subspace/span/basis/linear combination/Linear Dependent/Linear independance etc

c)Coordinate Geometry like Equation of st:line/Perpendiculat distance/Parallel distance etc

If you do not understand above,you can not beoome a data scientist as those are very basics.

4)Start with Statistics for Management by Levin abd Rubin

then get into the books which I mentioned in previous message

Regards

DATA LABS

read less Like 2

Comments 1 Lesson Posted on 28/10/2016 IT Courses/Data Science IT Courses/Data Analysis IT Courses/Data Modeling

REFERENCE BOOKS FOR DATA SCIENCE

Data Labs Training and Consulting Services

We provide online /Classroom training. Our team qualified from National Institute of Technology(NIT)...

Dear All, You can use the following books to master the DATA SCIENCE Concepts 1) First Course in Probability-Ronald Russel 2)Applied Regression Analysis-Drapper and Smith 3)Applied Multivariate Analysis-Richard A.Johnson and Dean W.Wichern 4)Elements of Statistical Learning-Trevor Hastie 5) R programming... read more

Dear All,

You can use the following books to master the DATA SCIENCE Concepts

1) First Course in Probability-Ronald Russel

2)Applied Regression Analysis-Drapper and Smith

3)Applied Multivariate Analysis-Richard A.Johnson and Dean W.Wichern

4)Elements of Statistical Learning-Trevor Hastie

5) R programming for Data Science-Roger D.Peng

For any help,call me.Happy to help you.

Regards

DATA LABS

read less Like 2

Comments Looking for Advanced Statistics Training

Find best Advanced Statistics Training in your locality on UrbanPro.

Answered on 29/12/2014 IT Courses/Advanced Statistics

Satyanarayana Rao M.

Trainer

n/(2^(n -1)) = 3/2^2 = 3/4

Like 1

Answers 7 Comments UrbanPro.com helps you to connect with the best Advanced Statistics Training in India. Post Your Requirement today and get connected.

x

Ask a Question