Video Tutorial:

     In this section, I am going to explain how to use scikit learn /sk learn(a machine learning package in python) to do Linear regression for a set of data points.

     Please go through the previous section - Linear Regression theory for better understanding.

      I am not going to explain training-testing data, model evaluation concepts here, but they are important.

     We know that the equation of a line is given by:

       y=mx+b

where 'm' is the slope and 'b' is the intercept.

     Our goal is to find the best values of slope (m) and and intercept (b) to fit our data.

The Linear Regression uses Ordinary Least Squares method to fit our data points,

import statement:

from sklearn import linear_model

I have some height and weight data of some people. Lets use this data to do linear regression and try to predict weight of other people.

height=[[4.0],[4.5],[5.0],[5.2],[5.4],[5.8],[6.1],[6.2],[6.4],[6.8]]
weight=[  42 ,  44 , 49, 55  , 53  , 58   , 60  , 64  ,  66 ,  69]

print("height weight")
for row in zip(height, weight):
    print(row[0][0],"->",row[1])

output:

height weight
4.0 -> 42
4.5 -> 44
5.0 -> 49
5.2 -> 55
5.4 -> 53
5.8 -> 58
6.1 -> 60
6.2 -> 64
6.4 -> 66
6.8 -> 69

import statement to plot graph using matplotlib:

import matplotlib.pyplot as plt

plotting the height and weight data:

plt.scatter(height,weight,color='black')
plt.xlabel("height")
plt.ylabel("weight")

output:

plot height and weight

Declaring the Linear Regression Function and calling fit method to learn from data:

reg=linear_model.LinearRegression()
reg.fit(height,weight)

slope and intercept:

m=reg.coef_[0]
b=reg.intercept_
print("slope=",m, "intercept=",b)

output:

slope= 10.1936218679 intercept= -0.4726651480
using the values of slope and intercept to construct the line to fit our data points:
plt.scatter(height,weight,color='black')
predicted_values = [reg.coef_ * i + reg.intercept_ for i in height]
plt.plot(height, predicted_values, 'b')
plt.xlabel("height")
plt.ylabel("weight")
output:
Regression Line

Now we can go ahead and predict the weight of people whose data is not there with us:

reg.predict(X=6.2)

output:

array([ 62.72779043])
reg.predict(X=8.0)

Output:

array([ 81.07630979])