*What is Logistic Regression?*

* Logistic regression is used to predict the outcome variable which is categorical.*

*What is a Categorical variable?*

* A categorical variable is a variable that can take only specific and limited values.*

**example: **

* Gender : Male/Female*

* yes/no , 0/1 etc,.*

* Lets consider a scenario:*

* I have data of some students. The data is about hours studied before exam and whether they passed - yes/no (1/0) .*

hoursStudied=[[1.0],[1.5],[2.0],[2.5],[3.0],[3.5],[3.6],[4.2],[4.5],[5.4], [6.8],[6.9],[7.2],[7.4],[8.1],[8.2],[8.5],[9.4],[9.5],[10.2]] passed = [ 0 ,0 , 0 , 0 , 0 ,0 , 0 , 0 ,0 , 0 , 1 , 0 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ] print("hoursStudied passed") for row in zip(hoursStudied, passed): print(" ",row[0][0]," ----->",row[1])

hoursStudied passed 1.0 -----> 0 1.5 -----> 0 2.0 -----> 0 2.5 -----> 0 3.0 -----> 0 3.5 -----> 0 3.6 -----> 0 4.2 -----> 0 4.5 -----> 0 5.4 -----> 0 6.8 -----> 1 6.9 -----> 0 7.2 -----> 0 7.4 -----> 1 8.1 -----> 1 8.2 -----> 1 8.5 -----> 1 9.4 -----> 1 9.5 -----> 1 10.2 -----> 1

*Lets plot the data and see how it looks:*

import matplotlib.pyplot as plt %matplotlib inline plt.scatter(hoursStudied,passed,color='black') plt.xlabel("hoursStudied") plt.ylabel("passed")

**If we plot a normal linear regression over our data points, it looks like this:**

* We know that output should be either 0 or 1.*

* We can see that this regression is producing all sort of values between 0 and 1. That's not the actual problem.*

* It is also producing impossible values : negative values and values greater than 1 which has no meaning.*

* So we need a better regression line than this. Logistic Regression is something we should use here.*

*The Logistic regression will fit our data points something like this:*

*The Logistic Function:*

* Most often, we would want to predict our outcomes as YES/NO (1/0).*

*For example:*

* Is your favorite football team going to win the match today? -- yes/no (0/1)*

* Does a student pass in exam? -- yes/no (0/1)*

**The logistic function is given by:**

* \(f(x)=\frac{L}{1+e^{-k(x-x_0)}}\)*

* where*

* L - Curve's maximum value*

* k - Steepness of the curve*

* \(x_0\) - x value of Sigmoid's midpoint*

*A standard logistic function is called sigmoid function (k=1,\(x_0=0\),L=1)*

* \(S(x)=\frac{1}{1+e^{-x}}\)*

The sigmoid curve

*The sigmoid function gives an 'S' shaped curve.*

*This curve has a finite limit of:*

* '0' as x approaches \(-\infty\)*

* '1' as x approaches \(+\infty\)*

*The output of sigmoid function when x=0 is 0.5*

*Thus, if the output is more tan 0.5 , we can classify the outcome as 1 (or YES) and if it is less than 0.5 , we can classify it as 0(or NO) .*

*For example: If the output is 0.65, we can say in terms of probability as:*

* "There is a 65 percent chance that your favorite foot ball team is going to win today " .*

*Thus the output of the sigmoid function can not be just used to classify YES/NO, it can also be used to determine the probability of YES/NO.*

*Now we shall check how Logistic/Sigmoid functions works using Python.*

*Imports:*

* We need math for writing the sigmoid function, numpy to define the values for X-axis , matplotlib.*

import math import matplotlib.pyplot as plt import numpy as np

*Next we shall define the sigmoid function as described by this equation:*

*\(f(x)=\frac{1}{1+e^{-x}}\)*

def sigmoid(x): a = [] for item in x: #(the sigmoid function) a.append(1/(1+math.exp(-item))) return a

*Now we shall generate some values for x :** This will have values from -10 to +10 with increment as 0.2 (-10.0,-9.8,...0,0.2,0.4...9.8)*

x = np.arange(-10., 10., 0.2)

Output:

[-10. -9.8 -9.6 -9.4 -9.2 -9. -8.8 -8.6 -8.4 -8.2 -8. -7.8 -7.6 -7.4 -7.2 -7. -6.8 -6.6 -6.4 -6.2 -6. -5.8 -5.6 -5.4 -5.2 -5. -4.8 -4.6 -4.4 -4.2 -4. -3.8 -3.6 -3.4 -3.2 -3. -2.8 -2.6 -2.4 -2.2 -2. -1.8 -1.6 -1.4 -1.2 -1. -0.8 -0.6 -0.4 -0.2 -0. 0.2 0.4 0.6 0.8 1. 1.2 1.4 1.6 1.8 2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2 4.4 4.6 4.8 5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6 6.8 7. 7.2 7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9. 9.2 9.4 9.6 9.8]

*We shall pass the values of 'x' to our sigmoid function and store its's output in variable 'y'.*

```
y = sigmoid(x)
```

*We shall plot the 'x' values in X-axis and 'y' values in Y-axis to see the sigmoid curve.*

plt.plot(x,y) plt.show()

*We can observe that , if 'x' is very negative, output is almost '0'. And if 'x' is very positive, its almost '1'. But when 'x' is '0', y is 0.5 .*