Activation Function is also referred to as Transfer Function. Before going to the explanation of Activation Function, we shall quickly recap the Mc-Culloch-Pitts model we had discussed in the last section.

 

artificial neuron model Mc Culloch - Pitts model
Mc-Culloch-Pitts Neuron Model

     After the summation function (blue circle), the output \(V_{k}\) was passed on to Threshold function (red box). The very idea doing this is to limit the output \(V_{k}\) to only 2 values (i.e., either 0 or 1).  The red box in this case can be called as an Activation Function.

     In simple words, the activation function is a function that limits the output signal to a finite value.

     The activation function can be a linear function (which represents straight line or planes) or a non-linear function (which represents curves). Most of the time, the activation functions used in neural networks will be non-linear.

Some common types of Activation Functions:

identity function graph
Linear/Identity Function Graph
  1. Linear Transfer Function / identity function

     The output of a identity function is equal to its input.

\(f(x) = x\) for all values of \(x\) 
Binary Step function graph
Binary Step Function Graph

2.Hard Limit Transfer function / Binary step function (with threshold value T)

     The output is 0 if input is less than Threshold value 'T', else output is 1.

     \(f(x)=\left\{\begin{matrix}
1\; for \; x>=T \\0 \; Otherwise\end{matrix}\right.\).
sigmoid or logistic function graph
Sigmoid/Logistic Function Graph

3. Sigmoid Function / Logistic Function 

     Sigmoid function converts the input values (which can range between \(-\infty \; to \; +\infty\)) and squashes all the values in between the range 0 to 1.

\(f(x)=\frac{1}{1+e^{-x}}\)
tan hyperbolic graph
Hyperbolic Tangent Function Graph

4. Hyperbolic Tangent Function

     The hyperbolic tangent function is also sigmoidal ('S' shaped curve) in nature but the value ranges between -1 and +1.

\(tanh(x)=\frac{sinh(x)}{cosh(x)}=\frac{e^x-e^{-x}}{e^x+e^{-x}}\)

 

Relu - Rectified Linear Units Graph
ReLU Function Graph

5. ReLU

    Rectified Linear Units. Its some what similar to identity function for x greater than 0. Training of neural networks is considered to be faster with ReLUs.

\(F(x)=max(0,x)\)

When input is less than 0, output is 0. Else, output is equal to input.

6. Softmax Function

     Softmax function is used to force inputs such that the total sum of output values will be equal to one (in other words, to represent probability distribution).

Confusing?? Let me give an example input and output.

input values=[2,4,5]

output values will be=[0.035,0.265,0.70]

so if you observe, the input 2 is converted to ->0.035

4-->0.265

5-->0.70

If you sum up all the output values, it would be equal to 1. i.e., 0.035+0.265+0.70=1

The equation is given by:

\(f(x_j)=\frac{e^{x_j}}{\sum\limits_{k=1}^{K}e^{x_k}} \; for\; j=1,2,3...K\).