Activation Function is also referred to as Transfer Function. Before going to the explanation of Activation Function, we shall quickly recap the Mc-Culloch-Pitts model we had discussed in the last section.

After the summation function (blue circle), the output $$V_{k}$$ was passed on to Threshold function (red box). The very idea doing this is to limit the output $$V_{k}$$ to only 2 values (i.e., either 0 or 1).  The red box in this case can be called as an Activation Function.

### In simple words, the activation function is a function that limits the output signal to a finite value.

The activation function can be a linear function (which represents straight line or planes) or a non-linear function (which represents curves). Most of the time, the activation functions used in neural networks will be non-linear.

Some common types of Activation Functions:

1. Linear Transfer Function / identity function

The output of a identity function is equal to its input.

##### $$f(x) = x$$ for all values of $$x$$

2.Hard Limit Transfer function / Binary step function (with threshold value T)

The output is 0 if input is less than Threshold value 'T', else output is 1.

##### $$f(x)=\left\{\begin{matrix}1\; for \; x>=T \\0 \; Otherwise\end{matrix}\right.$$.

3. Sigmoid Function / Logistic Function

Sigmoid function converts the input values (which can range between $$-\infty \; to \; +\infty$$) and squashes all the values in between the range 0 to 1.

##### $$f(x)=\frac{1}{1+e^{-x}}$$

4. Hyperbolic Tangent Function

The hyperbolic tangent function is also sigmoidal ('S' shaped curve) in nature but the value ranges between -1 and +1.

##### $$tanh(x)=\frac{sinh(x)}{cosh(x)}=\frac{e^x-e^{-x}}{e^x+e^{-x}}$$

5. ReLU

Rectified Linear Units. Its some what similar to identity function for x greater than 0. Training of neural networks is considered to be faster with ReLUs.

##### $$F(x)=max(0,x)$$

When input is less than 0, output is 0. Else, output is equal to input.

6. Softmax Function

Softmax function is used to force inputs such that the total sum of output values will be equal to one (in other words, to represent probability distribution).

Confusing?? Let me give an example input and output.

input values=[2,4,5]

output values will be=[0.035,0.265,0.70]

so if you observe, the input 2 is converted to ->0.035

4-->0.265

5-->0.70

If you sum up all the output values, it would be equal to 1. i.e., 0.035+0.265+0.70=1

The equation is given by:

##### $$f(x_j)=\frac{e^{x_j}}{\sum\limits_{k=1}^{K}e^{x_k}} \; for\; j=1,2,3...K$$.

Other good articles on Activation Functions: