Backpropagation algorithm is described by David E. Rumelhart, Geoffrey E. Hinton† & Ronald J. Williams in the famous paper "Learning representations by backpropagating errors" . You can download the original paper at link http://www.nature.com/nature/journal/v323/n6088/pdf/323533a0.pdf. In the paper, the authors described a procedure called backpropagation. Here I will show just short summary of main concepts related to it. This algorithm can be decomposed in the following steps: Network is first initialized by setting up all its weights to be small random numbers
The algorithm repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output of the network and the desired one. This difference is computed by means of an error function, which is commonly given as the sum of the squares of the differences between all target ti and actual node activations yi for the output layer. E is calculated by the network through composition of the node functions, so It is a continuous and differentiable function of the weights in the network. This method requires computation of the gradient of the error function at each iteration step:
In a way very similar to that of the delta rule, with the backpropagation algorithm, each weight is updated using the increment which is calculate as follows (where γ is the learning rate):
If we denote the backpropagated error at the jth node by δj, we can then express the partial derivative of E with respect to wkj as: where yi is the output of unit i.
So we have Putting it all together we have that error term is given by following expressions:
For a weight connecting a node in layer k to a node in layer j the change in weight is given by where α is the learning rate, a real value on the interval (0,1], yk is the activation of the node in layer k, n refers to the training epoch (the number of iteration in the training algorithm loop), η is the momentum. Introduction of the momentum rate η allows the attenuation of oscillations in the iteration process. With momentum, once the weights start moving in a particular direction in weight space, they tend to continue moving in that direction. Despite the complexity of the previous formulas, essence of backpropagation algorithm is in the last three. If you trust that they are correct there is no need to know more.
