Handout-NN-Final version

Notation

Forward propagation

For each layer =1,2,,L\ell=1, 2, \ldots, L:

Backward propagation for binary classification

Assumptions:

  • We are using binary classification, so the last layer of the network will have sigmoid activation. Every other layer will use ReLU.

Backpropagation Overview

Backpropagation Step 1: Gradient at output layer

Backpropagation Step 2: calculations at each hidden layer

Note that this process proceeds backwards from the output layer to the input layer, so we already calculated JW[L]\frac{\partial J}{\partial \bm{W}^{[L]}} and Jb[L]\frac{\partial J}{\partial \bm{b}^{[L]}}, so now we need to calculate the partial derivatives for the other W’s and b’s.

Backpropagation Step 3: Gradient descent

For each =1,,L\ell = 1, \dots, L, update using gradient descent with learning rate α\alpha:

  • W[]W[]αJW[]\bm W^{[\ell]} \gets \bm W^{[\ell]} - \displaystyle \alpha \frac{\partial J}{\partial \bm{W}^{[\ell]}}
  • b[]b[]αJb[]\bm b^{[\ell]} \gets \bm b^{[\ell]} - \displaystyle \alpha \frac{\partial J}{\partial \bm{b}^{[\ell]}}

Implementation Hints