true

Find the best tutors and institutes for R Programming

Find Best R Programming

Please select a Category.

Please select a Locality.

No matching category found.

No matching Locality found.

Outside India?

Search for topics

R Programming Updates

Ask a Question

Post a Lesson

All

All

Lessons

Discussion

Lesson Posted on 02/12/2019 IT Courses/R Programming

Basic R Syntax - R assignment operators

Sushree Sarita Sahoo

A seasoned analytics professional with 14+ years of extensive experience of Risk Management and Marketing...

The assignment operators that R Supports are given below -> <- = ->> <<-
Comments
Dislike Bookmark

Lesson Posted on 22/12/2017 IT Courses/Data Science IT Courses/Machine Learning IT Courses/R Programming

Decision Tree or Linear Model For Solving A Business Problem

Ashish R.

SAS certified analytics professionals, more than 11 years of industrial and 12 years of teaching experience....

When do we use linear models and when do we use tree based classification models? This is common question often been asked in data science job interview. Here are some points to remember: We can use any algorithm. It is purely depends on the type of business problem we are solving and who is end user... read more

When do we use linear models and when do we use tree based classification models? This is common question often been asked in data science job interview. Here are some points to remember:

We can use any algorithm. It is purely depends on the type of business problem we are solving and who is end user of the model and how he is going to consume the model’s output. Let’s look at some key factors which will help you to decide which model to use:

  1. If the relationship between dependent & independent variable is well approximated by a linear model, linear regression will outperform tree based model. No doubt in this aspect. If the realationship is not linear then tree model is better to choose as lot of complicated transformation might be required on the independent variables to make the relationship linear.
  2. If there is a higher degree of non-linearity between dependent & independent variables, a tree model will perform better than Linear Regression Model. How do you check the linearity? Simply create the bivariate plot of dependent variable and independent variables and study the plots to determine what kind of relationship is between Y and the chosen X variable.
  3. Decision tree models do not require too much data cleaning (missing value and outlier effect). Hence easy and fast to develop and easy to explain to our customers as well.
  4. If your business problem demands the possible cause or path to reach to the target variable then tree is easy to explain whereas finding the nature of relationship of the predictor variables with the target variable Linear regression is a better choice.
  5. Decision tree models are even easier to interpret from a layman point of view.
read less
Comments
Dislike Bookmark

Looking for R Programming

Find best R Programming in your locality on UrbanPro.

FIND NOW

Lesson Posted on 22/12/2017 IT Courses/Data Science IT Courses/Machine Learning IT Courses/R Programming

Basics Of R Programming 1

Ashish R.

SAS certified analytics professionals, more than 11 years of industrial and 12 years of teaching experience....

# To know the working directory which is assigned by defaultgetwd()# set the working directory from where you would like to take the files setwd("C:/Mywork/MyLearning/MyStuddocs_UrbanPro/Data") # Assign the path as per the location where you want to allocate getwd() # to see the list of files in your... read more

# To know the working directory which is assigned by default
getwd()
# set the working directory from where you would like to take the files
setwd("C:/Mywork/MyLearning/MyStuddocs_UrbanPro/Data") # Assign the path as per the location where you want to allocate

getwd()

# to see the list of files in your working directory- just assigned above
dir() ## Lists files in the working directory

# Creating a folder in C drive
dir.create("C:/Mywork/MyLearning/MyStuddocs_UrbanPro/Data/Nov26")


#install.packages("car")
#install.packages("Hmisc")
#install.packages("reshape")
#install.packages('pastecs')
#install.packages('gtools')
#install.packages('gmodels')
#install.packages('caret')
#install.packages('MASS')


##-----------------------------------------------------------
## Load required libraries
##-----------------------------------------------------------
# calling the libraries in each active session is very much required
#if we want to use the functions in the library

library(foreign)
library(MASS) # for stepAIC()
library(Hmisc) # for describe()
library(boot)
library(pastecs) # for stat.desc()
library(gmodels)
library(gtools)
library(lattice)
library(ggplot2)
library(caret)
library(car)
library(foreign)
library(reshape)
library(Hmisc)

version # to check what version u are using

# import world data set
world

dim(world) # check how many rows and columns

View(world) # to View the data frame

trans<-read.csv("TransactionMaster.csv")

View(trans)

cust<-read.csv("CustomerMaster.csv")

View(cust)

dim(cust)

str(cust) # to check the structure/meta data of the data frame

# carbon copy of the file

cust_copy<-cust[,]

#save as a R file

saveRDS(cust_copy,"C:/Mywork/MyLearning/MyStuddocs_UrbanPro/Data/customerdata")

# take a sample of 100 rows and all the columns and create a sample file
# 1:100 stands for 100 rows and after comma blank means all columns to pick up
cust_sample<-cust[1:100,]

dim(cust_sample)


# take all the rows and specific columns from teh source file "cust"
samplefile

# take all rows and specific column numbers 1,8,9
samplefile

# do the frequency distribution of the City variable
table(cust$City)

# do a cross table freqency distribution of City and State variable
table(cust$State,cust$City )

 

table(world$deathCat, world$birthCat)


# calculate average value of energy_use_percapita variable from the world
mean(world$energy_use_percapita, na.rm=T)

#calculate median value of gni_per_capita
median(world$gni_per_capita) # 50th percentile


# to check the type of the R objects
class(world)
class(cust)
class(trans)

is.vector(world)
is.factor(world)
is.data.frame(world)
is.matrix(cust)

length(world) # display the number of cloumns : partcularly use for vectors

head(trans) # display first 6 rows in console

head(trans, n = 2) # Display top 2 rows

tail(trans) # display last 6 rows of a data frame

tail(trans,n=1)

firstfewrows

View(firstfewrows)


# to store the country names in lower case letters

world$country_name<-tolower(world$country_name)

# dropping the first column from a data frame and create a new one

world_1<-world[,-c(1)]

# filter out the atlanta customers

atlantaCustomers


# filter out atlanta or hollywood customers : | OR operator & AND opearator

atlantaHollyCustomers <-cust[which(cust$City == "ATLANTA" | cust$City == "HOLLYWOOD" ) , ]

## Selecting specific cloumns
atlantaCustomers1


# filtering out data with multiple conditions

highSales_mod<-trans[which(trans$Sales_Amount >= 100 & trans$Sales_Amount <= 150 ),]


max(highSales_mod$Sales_Amount)

min(highSales_mod$Sales_Amount)

###------------------------------------------------------------
### Basic Date functions in R
###------------------------------------------------------------

Sys.Date() # Current date

today

class(today)

Sys.time() # Current date and time with time zone
time<-Sys.time()

class(time)

 

read less
Comments
Dislike Bookmark

Lesson Posted on 22/12/2017 IT Courses/Data Science IT Courses/Machine Learning IT Courses/R Programming

Market Basket Analysis

Ashish R.

SAS certified analytics professionals, more than 11 years of industrial and 12 years of teaching experience....

Market Basket Analysis (MBA): Market Basket Analysis (MBA), also known as affinity analysis, is a technique to identify items likely to be purchased together. The introduction of electronic point of sale systems has led to collection of large amount of data. Simple, yet powerful - MBA is an inexpensive... read more

Market Basket Analysis (MBA):

Market Basket Analysis (MBA), also known as affinity analysis, is a technique to identify items likely to be purchased together. The introduction of electronic point of sale systems has led to collection of large amount of data. Simple, yet powerful - MBA is an inexpensive technique to identify cross-sell opportunities mostly in CPG industries. A classic example is toothpaste and tuna. It seems that people who eat tuna are more prone to brush their teeth right after finishing their meal. So, why it is important for retailers to get a good grasp of the product affinities? This information is critical to appropriately plan for promotions because reducing the price on some items may cause a spike on related high-affinity items without the need to further promote these related items.

Market Basket Analysis (MBA) is a data mining technique which is widely used in the consumer package goods (CPG) industry to identify which items are purchased together. The classic example of MBA is diapers and beer: "An apocryphal early illustrative example for this was when one super market chain discovered in its analysis that customers that bought diapers often bought beer as well, have put the diapers close to beer coolers, and their sales increased dramatically. Although this urban legend is only an example that professors use to illustrate the concept to students, the explanation of this imaginary phenomenon might be that fathers that are sent out to buy diapers often buy a beer as well, as a reward."

The example may or may not be true, but it illustrates the point of MBA.

The analysis can be applied in various ways:

  • Develop combo offers based on products sold together
  • Organize and place associated products/categories nearby inside a store
  • Determine the layout of the catalog of an ecommerce site
  • Control inventory based on product demands and what products sell together
  • Credit card transactions: items purchased by credit card give insight into other products the customer is likely to purchase.
  • Supermarket purchases: common combinations of products can be used to inform product placement on supermarket shelves.
  • Telecommunication product purchases: commonly associated options (call waiting, caller display, etc) help determine how to structure product bundles which maximize revenue
  •  Banking services: the patterns of services used by retail customers are used to identify other services they may wish to purchase.
  •  Insurance claims: unusual combinations of insurance claims can be a sign of fraud.
  • Medical patient histories: certain combinations of conditions can indicate increased risk of various complications.

Three common terminologies are used a lot in the market basket analysis, which is mostly based on classical definition of probability:

i. Support, Confidence and Lift:

There are several measures used to understand various aspects of associated products. Let's understand the measures with the help of an example. In a store, there are 1000 transactions overall. Item A appears in 80 transactions and Item B occurs in 100 transactions. Items A and B appear in 20 transactions together.

a. Support: The simplest one, Support is the ratio of number of times two or more items occur together to the total number of transactions. Support of A = P(A) = 80/1000 = 8% and Support of B = P(B) = 100/1000 = 10%.

Support of a product or product bundle indicates the popularity of the product or product bundle in the transaction set. Higher the support, more popular is the product or product bundle. This measure can help in identifying driver of traffic to the store. Hence, if Barbie dolls have a higher support then they can be appropriately priced to entice traffic to a store.

b. Confidence is a conditional probability that a randomly selected transaction will include Item A given Item B. Confidence of A = P(A|B) = 20/100 = 20%.

i.e. to measure the probability of bundling/selling propensity of a product A when it is bundled with B.

Confidence can be used for product placement strategy and increasing profitability. Place high-margin items with associated high selling (driver) items. If Market Basket Analysis indicates that customers who bought high selling Barbie dolls also bought high-margin candies, then candies should be placed near Barbie dolls.

c. Lift can be expressed as the ratio of the probability of Items A and B occurring together to the multiple of the two individual probabilities for Item A and Item B. Lift = P(A,B) / P(A).P(B) = (20/1000)/((80/1000)*(100/1000)) = 2.5.

Lift indicates the strength of an association rule over the random co-occurrence of Item A and Item B, given their individual support. Lift provides information about the change in probability of Item A in presence of Item B. Lift values greater than 1.0 indicate that transactions containing Item B tend to contain Item A more often than transactions that do not contain Item B.

In order to gain better insights, differentiate Market Basket Analysis based on:

  • Weekend vs weekday sales.
  • Month beginning vs month-end sales.
  • Different seasons of the year.
  • Different stores.
  • Different customer profiles.

Based on the content and value of the basket, it is useful to classify the trip. Variables such as total basket value, number of items, number of category X vs. category Y items, help in developing rules to map each of the baskets to a previously defined classification. Understanding what kind of shopping trips a customer performs at a particular store at a particular time is critical for planning purposes. This data provides a unique window into what is happening at the store and enables advanced applications such as labor scheduling, product readiness and even temporary layout changes.

Not Just Retail:

Although Market Basket Analysis reminds pictures of shopping carts and supermarket shoppers, there are many other areas in which it can be applied. These include:

For a financial services company:

  • Analysis of credit and debit card purchases.
  • Analysis of cheque payments made.
  • Analysis of services/products taken e.g. a customer who has taken executive credit card is also likely to take personal loan of $5,000 or less.

For a telecom operator:

  • Analysis of telephone calling patterns.
  • Analysis of value-add services taken together. Rather than considering services taken together at a point in time, it could be services taken over a period of, let's say, six months.

A predictive market basket analysis can be used to identify sets of products/services purchased (or events) that generally occur in sequence — something of interest to direct marketers, criminologists and many others.

Advanced Market Basket Analysis provides an excellent way to get to know the customer and understand the different behaviors. This insight, in turn, can be leveraged to provide better assortment, design a better planogram and devise more promotions that can lead to more traffic and profits.

read less
Comments
Dislike Bookmark

Lesson Posted on 14/07/2017 IT Courses/Data Science IT Courses/R Programming

Use Data Science To Find Credit Worthy Customers

Ranjit Mishra

I have Certificate Degree in Predictive Business Analytics from Northwestern University, USA. Have been...

K-nearest neighbor classifier is one of the simplest to use, and hence, is widely used for classifying dynamic datasets. Click on the link to see how easy it is to classify credit-worthy vs credit-risk customers: gc ## Default checkingstatus1 duration history purpose amount savings employ ## 1... read more

K-nearest neighbor classifier is one of the simplest to use, and hence, is widely used for classifying dynamic datasets. Click on the link to see how easy it is to classify credit-worthy vs credit-risk  customers:

gc 
##   Default checkingstatus1 duration history purpose amount savings employ
## 1       0             A11        6     A34     A43   1169     A65    A75
## 2       1             A12       48     A32     A43   5951     A61    A73
## 3       0             A14       12     A34     A46   2096     A61    A74
## 4       0             A11       42     A32     A42   7882     A61    A74
## 5       1             A11       24     A33     A40   4870     A61    A73
## 6       0             A14       36     A32     A46   9055     A65    A73
##   installment status others residence property age otherplans housing
## 1           4    A93   A101         4     A121  67       A143    A152
## 2           2    A92   A101         2     A121  22       A143    A152
## 3           2    A93   A101         3     A121  49       A143    A152
## 4           2    A93   A103         4     A122  45       A143    A153
## 5           3    A93   A101         4     A124  53       A143    A153
## 6           2    A93   A101         4     A124  35       A143    A153
##   cards  job liable tele foreign
## 1     2 A173      1 A192    A201
## 2     1 A173      1 A191    A201
## 3     1 A172      2 A191    A201
## 4     1 A173      2 A191    A201
## 5     2 A173      2 A191    A201
## 6     1 A172      2 A192    A201
## Taking back-up of the input file, in case the original data is required later

gc.bkup 
##      duration.V1          amount.V1         installment.V1   
##  Min.   :-1.401713   Min.   :-1.070329   Min.   :-1.7636311  
##  1st Qu.:-0.738298   1st Qu.:-0.675145   1st Qu.:-0.8697481  
##  Median :-0.240737   Median :-0.337176   Median : 0.0241348  
##  Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.0000000  
##  3rd Qu.: 0.256825   3rd Qu.: 0.248338   3rd Qu.: 0.9180178  
##  Max.   : 4.237315   Max.   : 5.368103   Max.   : 0.9180178
## Let's predict on a test set of 100 observations. Rest to be used as train set.

set.seed(123) 
test 
## [1] 68
100 * sum(test.def == knn.5)/100  # For knn = 5
## [1] 74
100 * sum(test.def == knn.20)/100 # For knn = 20
## [1] 81
## If we look at the above proportions, it's quite evident that K = 1 correctly classifies 68% of the outcomes, K = 5 correctly classifies 74% and K = 20 does it for 81% of the outcomes. 

## We should also look at the success rate against the value of increasing K.

table(knn.1 ,test.def)
##      test.def
## knn.1  0  1
##     0 54 11
##     1 21 14
## For K = 1, among 65 customers, 54 or 83%, is success rate. Let's look at k = 5 now

table(knn.5 ,test.def)
##      test.def
## knn.5  0  1
##     0 62 13
##     1 13 12
## For K = 5, among 76 customers, 63 or 82%, is success rate.Let's look at K = 20 now

table(knn.20 ,test.def)
##       test.def
## knn.20  0  1
##      0 69 13
##      1  6 12
##For K = 20, among 88 customers, 71 or 80%, is success rate.

## It seems increasing K increases the classification but reduces success rate. It is worse to class a customer as good when it is bad, than it is to class a customer as bad when it is good. 
## By looking at above success rates, K = 1 or K = 5 can be taken as optimum K.
## We can make a plot of the data with the training set in hollow shapes and the new ones filled in. 
## Plot for K = 1 can be created as follows - 

plot(train.gc[,c("amount","duration")],
           col=c(4,3,6,2)[gc.bkup[-test, "installment"]],
           pch=c(1,2)[as.numeric(train.def)],
           main="Predicted Default, by 1 Nearest Neighbors",cex.main=.95)
     
     points(test.gc[,c("amount","duration")],
            bg=c(4,3,6,2)[gc.bkup[-test,"installment"]],
            pch=c(21,24)[as.numeric(knn.1)],cex=1.2,col=grey(.7))
     
     legend("bottomright",pch=c(1,16,2,17),bg=c(1,1,1,1),
            legend=c("data 0","pred 0","data 1","pred 1"),
            title="default",bty="n",cex=.8)
     
     legend("topleft",fill=c(4,3,6,2),legend=c(1,2,3,4),
            title="installment %", horiz=TRUE,bty="n",col=grey(.7),cex=.8)

read less

Comments 1
Dislike Bookmark

Lesson Posted on 20/02/2017 IT Courses/Programming Languages/Python IT Courses/R Programming IT Courses/Hadoop

Python Programming or R- Programming

Jatin Miglani

Awarded with "Excellence Award Winner 2014" , "Excellence Award Winner 2015", "Excellence Award Winner...

Most of the students usually ask me this question before they join the classes, whether to go with Python or R. Here is my short analysis on this very common topic. If you have interest/or having a job requirement of data analysis and visual presentation of data using open source languages, then the... read more

Most of the students usually ask me this question before they join the classes, whether to go with Python or R. Here is my short analysis on this very common topic.

If you have interest/or having a job requirement of data analysis and visual presentation of data using open source languages, then the search become so narrow and you have the question whether to learn Python or R.

Here is my first thought:

To start data analysis projects, both Python and R are easy-to-use, free and does not need any heavy expertise to implement.

The installation, configuration and package management for both languages is simple and efficient. In general, for a newbie in data science development, it makes sense to be unsure whether to learn R or Python first.

In this article, I will highlight some of the differences between R and Python, and how they both have a place in the data science and statistics world.

  • Python is a general purpose interpreted programming language, which can be used for normal software development,web programming ,data mining application development,statistical data analysis,data visualization and ETL programming.But R has been developed by keeping the needs of  statisticians in mind, thereby has limited area of development.
  • R can be difficult to get into if you have experience with a previous programming language: it isn’t constructed by computer scientists for computer scientists. Unlike Python which is built to have a simple syntax, R has a tricky syntax with a bit of a steep learning curve.
  • R is mainly used when the data analysis task requires standalone computing or analysis on individual servers, but Python supports web programming, distributed data system management, support for the most popular Block Chain database management and many free python packages designed for specific needs like: posting daily status updates in Facebook, Data file version management, Wikipedia content viewer package etc. It means, you can code the complex requirements in just few lines, just by importing appropriate package.
  • R has some specific IDE for program development. The most popular one is the RStudio.Python is not bound to a particular IDE.Dozens of Python IDEs are available like: Anaconda, PyScriptor, WingIDE, Spyder, GlueViz, PyCharm, PyDev, IDLE,Komodo edit. Python has been used to write all, or parts of, popular software projects like dnf/yum, OpenStack, OpenShot, Blender, Calibre, and even the original BitTorrentclient, Youtube is the best example of python’s capability.
  • In data science point of view,both R and Python are powerful.Below is the statistical figures of the comparison.Despite the above figures, most people prefer Python instead of R, due to its flexibility and simplicity.
  • R programming is basically meant for statistical programmers,who deal with high level of statistical data analysis.But Python is general-purpose programming language that can also be used as statistical data analyzer.Just use some packages !
  • The programming construct of R is based on “Vector Concept”.R treats each and every entity as a vector.Hence, some hands-on for vectors  needs to be done to understand the variable control strategy in R.Python is a free form language which completely based on OOP concepts. Each entity is considered either as a variable with some valid values assigned or an object which is instantiated from a class.
  • R programming construct is a little bit complicated as compared to Python.Let’s take an example: Finding mean in R:- sapply(nba, mean, na.rm=TRUE) 
  • Finding mean in Python:- np.mean() The built-in functions in Python are easy-to-use and simple. No need to pass too many parameters.
  • R programs run slower than Python. When a sensitive production environment like Telecom Billing Engine scenario is considered,it adds a negative feedback. Python is faster and fault-tolerant. The Python apps are light-weight,though they have the capacity to process Gigabytes of data.
  • Python supports Hadoop programming construct i.e. you can write a python program which can use the virtues of HadoopMapReduce construct for exploratory data analysis. However, R does not support Hadoop.
  • You can link any open source ETL tool like Pentaho, Kettle, CloverETLetc.If you are an expert in the ETL concepts, you can use the “PyGramETL” package of Python for building your own ETL tools in Python !!
  • Conclusion:- Python is versatile, simple, easier to learn, and powerful because of its usefulness in a variety of contexts, some of which have nothing to do with data science. R is a specialized environment that looks to optimize for data analysis, but which is harder to learn.
read less
Comments
Dislike Bookmark

Lesson Posted on 28/10/2016 IT Courses/R Programming

R programming: Factors

Indranil Rath

Thanks so much for your interest in my R programming for Statistics and Data Science workshop. I am an...

Admittedly, R has a steep learning curve. But most common errors faced by beginners can be resolved with a good understanding of the language basics. For example, categorical variables must always be converted to 'factor' class before any data analysis in R. Here's an intuitive explanation. Say we have... read more

Admittedly, R has a steep learning curve. But most common errors faced by beginners can be resolved with a good understanding of the language basics. For example, categorical variables must always be converted to 'factor' class before any data analysis in R. Here's an intuitive explanation. Say we have a an atomic vector with 1 = female and 0 = male:

> gender_vec = c(1, 0, 0, 1, 1)

Would an arithmetic operation on this vector make sense? No!

> gender_vec + 2           # meaningless!

[1] 3 2 2 3 3

We would like R to distinguish between such categorical variables and other numeric vectors. The 'factor' class in R allows for this distinction. Once we convert our vector to 'factor' class, R recognises that the vector represents a categorical variable and refuses to perform meaningless operations on it.

> as.factor(gender_vec)

[1] 1 0 0 1 1

Levels: 0 1

 

 

 

 

read less
Comments
Dislike Bookmark

Answered on 10/10/2016 IT Courses/R Programming

What are the best institutes for learning R programming (classroom training) in Bangalore?

Dni Institute

If you are looking for Personalised and Scenario based learning , please come and meet us DnI Institute
Answers 1 Comments 1
Dislike Bookmark

Looking for R Programming

Find best R Programming in your locality on UrbanPro.

FIND NOW

Answered on 06/09/2016 IT Courses/R Programming

Ranjit Mishra

Tutor

Depending on your programming skill and your awareness of use of free study materials on websites, I can suggest you various ways to learn R. I would say you can lean R for absolutely free on your own. Pls get on touch if you need more info.
Answers 11 Comments
Dislike Bookmark

About UrbanPro

UrbanPro.com helps you to connect with the best R Programming in India. Post Your Requirement today and get connected.

Overview

Lessons 11

Total Shares  

+ Follow 3,219 Followers

Top Contributors

Connect with Expert Tutors & Institutes for R Programming

x

Ask a Question

Please enter your Question

Please select a Tag

X

Looking for R Programming Classes?

Find best tutors for R Programming Classes by posting a requirement.

  • Post a learning requirement
  • Get customized responses
  • Compare and select the best

Looking for R Programming Classes?

Find best R Programming Classes in your locality on UrbanPro

Post your learning requirement

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 25 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 6.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more