true

Learn Data Science from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Tuning Parameters Of Decision Tree Models

19/12/2017 0 0

Implementations of the decision tree algorithm usually provide a collection of parameters for tuning how the tree is built. The defaults in Rattle often provide a basically good tree. They are certainly a very good starting point, and indeed may be a satisfactory end point! However, tuning will be necessary where, for example, the target variable has very few observations of the particular class of interest.(Why?)

The following tuning parameters are quite useful to know and use in developing many tree based classifications. For more details about all the tuning parameters to know please type “?rpart.control” in the console of Rstudio. J

Minsplit : It is used for Minimum number of observations for a node to be considered for a split.
Minbucket: It is the minimum number of observations in any terminal node.

If only one of “minbucket” or “minsplit” is specified, the code either set minsplit = minbucket*3 or minbucket = minsplit/3, as appropriate. The two options “minbucket” and “minsplit” are closely related. In rpart if either is not specified then by default the other one is calculated by using the above formula. Usually a node will always have at least “minbucket” entities, and it will be considered for splitting if it has at least “minsplit” entities and on splitting, each of its children have at least “minbucket” entities. Simply saying if we specify “minbucket”, it would automatically take 3 time of minbucket as minsplit parameter by default.

Maxdepth: Set the maximum depth of any node of the final tree, with the root node counted as depth 0. Input value greater than 30 “rpart” will give nonsense results on 32-bit machines.
Priors: Sometimes the proportions of classes in a training set do not reflect their true proportions in the population. We can inform the population proportions to Rattle package, and the resulting model will reflect these.

The so-called priors can also be used to “boost” a particularly important class, by giving it a higher prior probability, although this might best be done through the Loss Matrix.

In Rattle the priors are expressed as a list of numbers that sum up to 1. The list must be of the same length as the number of unique classes in the target variable. An example for binary classification is 0.5, 0.5.

The default priors are set to be the class proportions as found in the training dataset.

Using rpart directly we specify prior within an option called parm:

Complexity parameter (cp): This parameter controls how splits are carried out (i.e., the number of branches in the tree). The value should be under 1, and the smaller the value, the more branches in the final tree. A value of "Auto" or omitting a value will result in the "best" complexity parameter being selected based on cross-validation. Usually the default cp value is considered to be 0.01.

The complexity parameter (cp) is used to control the size of the decision tree and to select the optimal tree size. If the cost of adding another variable to the decision tree from the current node is above the value of cp, then tree building does not continue. We could also say that tree construction does not continue unless it would decrease the overall lack of fit by a factor of cp. Setting this to zero will build a tree to its maximum depth (and perhaps will build a very, very, large tree). This is useful if we want to look at the values for CP for various tree sizes. This information will be in the console window. We will look for the number of splits where the sum of the xerror (cross validation error, relative to the root node error) and xstd(variance of the relative errors ) is minimum. This is usually early in the list.

The option to use cp parameter in R is as follows:

# control = rpart.control(minsplit = 50)

## example control = rpart.control(cp = , minsplit =))

0 Like 0 Dislike

Follow 2

Other Lessons for You

Basics Of Machine Learning

We have all been hearing recently about the term "Artificial Intelligence" recently, and how it will shape our future. Well, Machine Learning is nothing but a minor subfield of the vast field...

Saumya Rajen Shah

1 0

Naive Bayes Classifiers

Hello everyone, I thought to post an article on Machine learning. There are supervised classifiers which are used to classify test data in some class. For example, seeing an image if you want to predict...

Abhi S

0 0

Beware Of Trainers Of Data Science.

Most of the trainers in the market are teaching DATA SCIENCE as 1) Some software tools like R/Python/SAS/Hadoop etc 2)They are spending less amount of time on Mathematics and Statistics(Mostly 10 hrs...

Data Labs Training and Consulting Services

2 1

REFERENCE BOOKS FOR DATA SCIENCE

Dear All, You can use the following books to master the DATA SCIENCE Concepts 1) First Course in Probability-Ronald Russel 2)Applied Regression Analysis-Drapper and Smith 3)Applied Multivariate Analysis-Richard...

Data Labs Training and Consulting Services

3 0

An Introduction to Probability and Theory of Counting

In this lesson, we introduce the concept of probability with high school mathematics as a prerequisite. Before we start, I want to make you familiar with some standard terms in probability theory. The...

Sayan Mukhopadhyay

0 0

Find Data Science Classes near you

Looking for Data Science Classes?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Data Science Questions

How to learn Data Science?

6 Answers

Is that possible to do machine learning and Data science course after B.com, MBA Finance and marketing students and how is career growth?

24 Answers

I have been in the teaching field for 4+ years working as an assistant professor now I need to get into...

20 Answers

I have 2+ yrs working experience in BI domain. Can I pursue Data science for a job change? Will I get...

8 Answers

I want to get into data science but I dont have any prior knowledge on any of the programing languages, how do I go about it?

19 Answers

Looking for Data Science Classes?

The best tutors for Data Science Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn Data Science with the Best Tutors

The best Tutors for Data Science Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.