__Task: Calculating the Correlation-coefficient using Python__

** **

We know that the correlation coefficient is calculated using the formula

__ nΣxy- ΣxΣy / (√(nΣx^2-(Σx)^2) * (nΣy^2-(Σy)^2))__

** **

In the above formula, n is the total number of values present in each set

of numbers (the sets have to be of equal length). The two sets of numbers

are denoted by x and y (it doesn’t matter which one you denote as which).

The other terms are described as follows:

** **

__Σxy :__Sum of the products of the individual elements of the two sets

of numbers, x and y

__Σx : __Sum of the numbers in set x

** Σy: Sum of the numbers in set y**

** Σx^2:Square of the sum of the numbers in set x**

** Σy^2:Square of the sum of the numbers in set y**

** (Σx)^2 :Sum of the squares of the numbers in set x**

** (Σy)^2:Sum of the squares of the numbers in set y/**

Let us now write a Python Program which calculates the correlation coefficient for us. We will be using the following two functions in the program:

- Sum(x) : Using this function on a list of numbers,x will sum up the numbers in the list.

- Zip(x,y): returns the list of corresponding numbers in lists x,y which you can then use in a loop to perform other operations.

import os

import sys

#A Program to calculate the correlation coefficient

def find_corr_x_y(x,y):

n = len(x)

#Find the sum of the products

prod = []

for xi,yi in zip(x,y):

prod.append(xi*yi)

sum_prod_x_y = sum(prod)

sum_x = sum(x)

sum_y = sum(y)

squared_sum_x = sum_x ** 2

squared_sum_y = sum_y ** 2

x_square = []

for xi in x:

x_square.append(xi**2)

x_square_sum = sum(x_square)

y_square = []

for yi in y:

y_square.append(yi**2)

y_square_sum = sum(y_square)

numerator = n * sum_prod_x_y - sum_x * sum_y

dterm1 = n*x_square_sum - squared_sum_x

dterm2 = n*y_square_sum - squared_sum_y

denm = (dterm1 *dterm2) ** 0.5

corr = numerator / denm

return corr

crr = 0

X1 = [5.1,3.2,3,1.4,3.8,1.0,2.8,-0.3,6.9,2.5,6.2,4.6]

Y = [30,29,30,35,36,36,34,48,24,27,21,30]

if (len(X1) == len(Y)):

crr = find_corr_x_y(X1,Y)

print("Pearson product-moment Correlation Coefficient = {0}".format(crr))

if (crr >= 0.8):

print("Strong Positive Correlation")

elif (crr <= -0.8):

print("Strong Negative Correlation")

else:

print("Sorry,the data set lengths are not equal")

The **find_corr_x_y() function** accepts two arguments, x and y, which are the two sets of numbers we want to calculate the correlation for. Inside this function all terms used for calculating the Correlation coefficient are obtained. Also, correlation coefficient is only calculated when the list of numbers passed to the function are equal in length.

__OUTPUT__

>>>

Pearson product-moment Correlation Coefficient = -0.823545657378

Strong Negative Correlation

>>>

Try writing this program,students, in your computer and see how it runs,with equal and unequal lists of numbers.