true

Learn SAP from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Variations Of Random Forest In R

14/07/2017 0 0

One of the important steps in using analytics to generate insights is model fitting. Typical projects involve a lot of data cleaning so that high accuracy is achieved on application of the model. Competitions are all about data cleaning and models. There are various models which can be fitted on data under different conditions. One of the most intuitive of those models is decision trees. Decision trees classify data into buckets based on “decisions” based on the feature values. Most of the competitions start with bench-marking based on results from ensemble of trees, known as random decision forests. Random Forests, as they are called, use ensemble of trees based and are the best examples of ‘Bagging’ techniques. R, the popular language for model fitting has made a variety of random forest packages available for use. Let’s discuss a few of them (in no way this list is exhaustive).

RandomForest: The ‘classic’ package in R which implements the most basic random forest logic and is really robust. The package is very user friendly and provides the user with the option to tune features such as number of trees and depth of trees. The package optionally provides the ability to derive feature importanceand proximity measures. Feature importance is based on the error increase when OOB data is changed while keep all other things same. On the other hand, Proximity measure is a matrix where (i, j) element indicates fraction of trees in which elements i and j fall in the same terminal node. The package can be used for classification or regression problems and can be learnt with ease
Cforest: This package is computationally more expensive and better than the randomForest package in terms of accuracy. cforest uses OOB data which means more information and higher accuracy. At the same time it is slower and can handle less data for the same memory. It then uses weighted average of the trees to get the final ensemble. However, the main cause for cforest having a more reliable predictions is the fact that it produces unbiased trees. randomForest have a drawback that the simple algorithm is invariably biased towards features with many cut points. There are features which are continuous or have many categories and can be preferred. Whenever you have large computational resources at your disposal, do use cforest for accuracy.
ObliqueRF: “Oblique” forests is an underrated, advanced yet useful concept which is based on separating trees using hyper planes instead of features. They can easily outperform randomForest especially in cases when all the features are discrete or we have spectral data. Just like randomForest, Oblique forests are also governed by subspace dimensions(or number of features) and ensemble size(or number of trees). However, since they make oblique cuts rather than orthogonal ones, recursive binary splits and ridge regression are also involved for splitting. I have seen a cool implementation of oblique random forests as the prize winning code in a kaggle competition! Hence oblique random forests sure pack a punch. ObliqueRF does end up having a higher bias and lower variance than randomForest.
ParallelForest: ParallelForest is an implementation to run randomForest using parallel computing. The package has functions grow.forest. Its pretty handy when there are millions of rows in the training set. A data set which took days for randomForest package to fit on was handled by ParallelForest in under an hour. However, there are still doubts on whether the accuracy is the same for both packages under all conditions and whether classification can be implemented using parallel processing. (Another package bigrf is also based on using multi-threading and caching for very large data but it was not built with the objective to speed up processing rather it is based on handling very large data).
RandomUniformForest: This package produces unpruned trees and are useful for regression, classification and unsupervised learning. If cforest is slower but more accurate than randomForest then randomUniformForest falls on the other end of being the faster but slightly less accurate version. The trees have lower correlation, thereby resulting in lower bias but higher variance. Moreover, they involve use of uniform distribution. Since we don’t care much about bias as perfectly randomized trees will cancel it out, randomUniformForests are useful in situations where the features themselves follow specified distributions
Randomforest SRC: Survival, Regression and Classification(SRC) are the three types of models this package provides a unified function for. Additionally, there are multivariate and unsupervised extensions as well as parallel processing through openMP. I have come to use this package whenever there is doubt on what should be the best approach for data model fitting. Coupled with missing value imputation, the package provides a first look kind of model useful for further exploration and deep dive analysis.
Ranger: Ranger comes to the rescue when you have high dimensional data and want a memory efficient yet fast implementation of randomForest. The word ranger came from RANdom forest GEneRator. The main purpose where I have used ranger is to build models quickly and find out optimal parameter values using parameter tuning.
Rborist: Rborist is a high performance implementation of randomForest. Compared to original randomForest, this package optimizes the algorithms such that model fitting is performed with less data movement within memory and create opportunities for scaling up performance. Hence, as the features increase, the processing time increases only linearly (as opposed to exponential increase expected for randomForests). The package also supports missing value imputation. Hence, in projects where we ourselves generate a lot of features, this package becomes seemingly more suitable.

Since the idea being first suggested in the 90’s Random forests have become a popular method of model fitting and are used in various forms. There are even more implementations such as rotationForests(based on fitting features over principal components), xgboost (extreme gradient boosting, a clever tree based technique that uses boosting) and rFerns (useful for comparing images) and regularized random forests. This article will be useful for those who have had gone through decision tree and basic random forest concepts and are willing to learn its different variations in R.

0 Like 0 Dislike

Follow 0

Other Lessons for You

GDPR Data Privacy in 90 seconds

What is GDPR? -The General Data Protection Regulation is a law, that is meant to protect the privacy of an Individual belonging to the European Union. GDPR enhances the powers of regulatory authorities...

Vijay B

0 0

Python breaks into the top three programming languages in the Tiobe Index for the first time.

Python breaks into the top three programming languages in the Tiobe Index for the first time. The popularity of Python shows no sign of waning, with the programming language entering the top three in...

Deepak garg

0 0

Future and Scope of Big Data Analytics for Freshers/Experienced

Dear All, In early days of 1990s, there were simple applications. With the speed of light, the world progressed from mainframes and batch applications to personal computers and online applications....

Rakesh Roshan

0 0

SAP & Online SAP training consultants

What is SAP : SAP stands for Systems, Applications, Products in Data Processing.The original SAP idea was to provide customers with the ability to interact with a common corporate database for a comprehensive...

Sukumar

0 0

Why is the Hadoop essential?

Capacity to store and process large measures of any information, rapidly. With information volumes and assortments always expanding, particularly from web-based life and the Internet of Things (IoT), that...

Bright Computer Education

1 0

Find SAP Training near you

Online SAP Training

Looking for SAP Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

SAP Questions

Is there any placement consultancy for helping SAP MM Fresher Job in Pune?

5 Answers

How to do my SAP institute advertisement on UrbanPro?

6 Answers

Which is the best course after B.com ,MBA F&M ,SAP MM Consultant level Training and having 5yrs exp into SAP MM End user ?

32 Answers

Hi, I currently work in mainframe technology. i want to learn something new so planning to learn a different...

6 Answers

My name is sandeep i am working in mnc company mumbai, i am working on BI tools planing to learn sap...

12 Answers

Looking for SAP Classes?

The best tutors for SAP Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Migrate

Class

Subject

Chapter

Migrate to

Concept Solution

Migration Successful

Learn SAP with the Best Tutors

The best Tutors for SAP Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.