**and the output labels of the training set y**

_{(i)}_{(i)}and thus form a function F(x

_{(i),}θ), which would help in predicting future values.

^{(i)}_{,}θ). Note that, x(lowercase) is used to denote a single training example as a whole, where as we use X

_{ (i,j) }is used to point the j

^{th}feature for the i

^{th}training example. But confusing?? Let's simplify it!!

As shown, to show the whole feature set for a single example, we use x

_{(1)}. We can also point to the first feature in third example in training set as X

_{(3,1) }or x

_{1}

^{(3)}.

For simplicity of understanding, let's assume, there is only one feature in our dataset of 10 examples. Let's plot it on a graph.

So, let's say, the equation of the line we are supposed to fit the curve is h(x

^{(i)}_{,}θ)= θ

_{0 }+ x

**θ**

_{1}_{1}.

_{0}which is always equal to 1 so that we can rewrite h

_{θ}(x

**)= θ**

^{(i)}_{0 }x

**+θ**

_{0}_{1 }x

**.**

_{1}

^{(i)}_{,}θ)= θ

_{0 }x

**+ θ**

_{0}_{1 }x

**+ θ**

_{1}_{2 }x

**+ θ**

_{2}_{3}x

**θ**

_{3}……_{n }x

**for n number of features.**

_{n }_{θ}(x

**)=**

^{(i)}_{i=1}

^{n}∑ (θ

_{i}*x

**).**

_{i}def HypoThe(Theta,xi): if(len(Theta)==len(xi)): sum=0 for i in range(len(xi)): sum+=(Theta[i]*xi[i]) return sum else: return False

_{i=1}

^{m}

**∑**( h

_{θ}(x

^{(i)})-y

^{(i)})

^{2}

**/**(2*m)

_{θ}(x

^{(i)}) function, and subtract it from actual label y

^{(i)}, and then square it and add it for all training examples(1….m). We average it out using (2*m). Why did we use 2*m instead of m, we'll find it out soon. Let’s just say, it's for the simplicity, and also because we are dealing with a mean-squared computed error function which we are supposed to minimize.

def RegCostFunc(Theta,X,Y): sum1=0 for i in range(len(X)): sum1+=((HypoThe(Theta, X[i])-Y[i])**2) J=sum1 return J/(2*len(X))

_{j= }θ

_{j}- α * (? J(θ)/ ?θ

_{j})

_{j}) means that take partial derivative of J(θ) w.r.t θ

_{j }while rest of the θ are constant.

_{j}= ? (

_{i=1}^{m}∑( h_{θ}**(x**

^{(i)}**)-y**/ ?θ

^{(i)})^{2}/ (2*m) )_{j}.

_{j}=

_{i=1}^{m}∑( h_{θ}**(x**

^{(i)}**)-y**

^{(i)})^{2 }* x_{j}^{(i)}/ m_{j }in the direction of the slope of the function J(θ).

_{j.}If α is too large, the function will never converge actually.

_{j }continuously, but update the whole θ simultaneously.

_{j= }θ

_{j}- α * (

_{i=1}

^{m}∑( h

_{θ}(x

^{(i)})-y

^{(i)})

^{2 }* x

_{j}

^{(i)}/ m )

def GradTerm(X,Y,Theta,i): sum1=0 for j in range(len(X)): sum1+=((HypoThe(Theta,X[j])-Y[j])*X[j][i]) return sum1

def GradDesc(Theta,alpha,Xfeature,Ylabels): Theta_=[] for i in range(0,len(Theta)): Theta_.append(Theta(i)-(alpha* GradTerm(Xfeature,Ylabels,Theta,i)/len(Xfeature))) return Theta_

_{0 }= 1 to every example in data set.

def LinearRegression(Xfeature,Ylabels,alpha,iterations): if len(Xfeature)!=len(Ylabels): print("Missing Data"); return False else: for i in range(len(Xfeature)): Xfeature[i].insert(0,1) Theta=[0]*len(Xfeature[0]) for i in range(iterations): print("\nIteration Number ",i) print(Theta) Theta=GradDesc(Theta, alpha, Xfeature, Ylabels) print(Theta) return Theta

## Other Lessons for You

Find Machine Learning near you