Backpropagation Programmer

Backpropagation Programmers

Active1 year, 1 month ago

In this section, we compare the two programming models on the problem of auto differentiation, or backpropagation. Differentiation is of vital importance in deep learning because it's the mechanism by which we train our models. In any deep learning model, we define a.

I've been trying to learn how back-propagation works with neural networks, but yet to find a good explanation from a less technical aspect.

How does back-propagation work? How does it learn from a training dataset provided? I will have to code this, but until then I need to gain a stronger understanding of it.

nbro

6,14910 gold badges55 silver badges104 bronze badges

unleashedunleashed Backpropagation Programmer

3952 gold badges10 silver badges32 bronze badges

3 Answers

Back-propagation works in a logic very similar to that of feed-forward. The difference is the direction of data flow. In the feed-forward step, you have the inputs and the output observed from it. You can propagate the values forward to train the neurons ahead.

In the back-propagation step, you cannot know the errors occurred in every neuron but the ones in the output layer. Calculating the errors of output nodes is straightforward - you can take the difference between the output from the neuron and the actual output for that instance in training set. The neurons in the hidden layers must update their errors from this. Thus you have to pass the error values back to them. From these values, the hidden neurons can update their error and other parameters using the weighted sum of errors from the layer ahead.

A step-by-step demo of feed-forward and back-propagation steps can be found here.

Edit

If you're a beginner to neural networks, you can begin learning from Perceptron, then advance to NN, which actually is a multilayer perceptron.

nbro

6,14910 gold badges55 silver badges104 bronze badges

06050020605002

10.7k3 gold badges27 silver badges63 bronze badges

High-level description of the backpropagation algorithm

Backpropagation is trying to do a gradient descent on the error surface of the neural network, adjusting the weights with dynamic programming techniques to keep the computations tractable.

I will try to explain, in high-level terms, all the just mentioned concepts.

Error surface

If you have a neural network with, say, N neurons in the output layer, that means your output is really an N-dimensional vector, and that vector lives in an N-dimensional space (or on an N-dimensional surface.) So does the 'correct' output that you're training against. So does the difference between your 'correct' answer and the actual output.

That difference, with suitable conditioning (especially some consideration of absolute values) is the error vector, living on the error surface.

Gradient descent

With that concept, you can think of training the neural network as the process of adjusting the weights of your neurons so that the error function is small, ideally zero. Conceptually, you do this with calculus. If you only had one output and one weight, this would be simple -- take a few derivatives, which would tell you which 'direction' to move, and make an adjustment in that direction.

But you don't have one neuron, you have N of them, and substantially more input weights.

The principle is the same, except instead of using calculus on lines looking for slopes that you can picture in your head, the equations become vector algebra expressions that you can't easily picture. The term gradient is the multi-dimensional analogue to slope on a line, and descent means you want to move down that error surface until the errors are small.

Dynamic programming

There's another problem, though -- if you have more than one layer, you can't easily see the change of the weights in some non-output layer vs the actual output.

Dynamic programming is a bookkeeping method to help track what's going on. At the very highest level, if you naively try to do all this vector calculus, you end up calculating some derivatives over and over again. The modern backpropagation algorithm avoids some of that, and it so happens that you update the output layer first, then the second to last layer, etc. Updates are propagating backwards from the output, hence the name.

So, if you're lucky enough to have been exposed to gradient descent or vector calculus before, then hopefully that clicked.

The full derivation of backpropagation can be condensed into about a page of tight symbolic math, but it's hard to get the sense of the algorithm without a high-level description. (It's downright intimidating, in my opinion.) If you haven't got a good handle on vector calculus, then, sorry, the above probably wasn't helpful. But to get backpropagation to actually work, it's not necessary to understand the full derivation.

I found the following paper (by Rojas) very helpul, when I was trying to understand this material, even if it's a big PDF of one chapter of his book. Digital communication by amitabha bhattacharya ebook download.

Backpropagation Programmers

nbro

6,14910 gold badges55 silver badges104 bronze badges

NovakNovak Backpropagation Programmer

3,4061 gold badge17 silver badges43 bronze badges

I'll try to explain without delving too much into code or math.

Basically, you compute the classification from the neural network, and compare to the known value. This gives you an error at the output node.

Now, from the output node, we have N incoming links from other nodes. We propagate the error to the last layer before the output node. Then propagate it down to the next layer (when there is more than one uplink, you sum the errors). And then recursively propagate to the first

To adjust the weights for training, for each node you basically do the following:

learningRate and alpha are parameters you can set to adjust how quickly it hones in on a solution vs. how (hopefully) accurately you solve it in the end.

JamesJames

7,5831 gold badge21 silver badges26 bronze badges

Not the answer you're looking for? Browse other questions tagged artificial-intelligencecomputer-scienceneural-networkbackpropagation or ask your own question.

Permalink

Join GitHub today

GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.

Find file Copy path

Cannot retrieve contributors at this time

# backpropagation example for deep learning in python class.

# with sigmoid activation

# the notes for this class can be found at:

# https://deeplearningcourses.com/c/data-science-deep-learning-in-python

# https://www.udemy.com/data-science-deep-learning-in-python

from__future__import print_function, division

from builtins importrange

# Note: you may need to update your version of future

# sudo pip install -U future

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(1)

defforward(X, W1, b1, W2, b2):

Z =1/ (1+ np.exp(-X.dot(W1) - b1))

A = Z.dot(W2) + b2

expA = np.exp(A)

Y = expA / expA.sum(axis=1, keepdims=True)

return Y, Z

# determine the classification rate

# num correct / num total

defclassification_rate(Y, P):

n_correct =0

n_total =0

for i inrange(len(Y)):

n_total +=1

if Y[i] P[i]:

n_correct +=1

returnfloat(n_correct) / n_total

defderivative_w2(Z, T, Y):

N, K = T.shape

M = Z.shape[1] # H is (N, M)

# # slow

# ret1 = np.zeros((M, K))

# for n in xrange(N):

# for m in xrange(M):

# for k in xrange(K):

# ret1[m,k] += (T[n,k] - Y[n,k])*Z[n,m]

# # a bit faster - let's not loop over m

# ret2 = np.zeros((M, K))

# for n in xrange(N):

# for k in xrange(K):

# ret2[:,k] += (T[n,k]* - Y[n,k])*Z[n,:]

# assert(np.abs(ret1 - ret2).sum() < 0.00001)

# # even faster - let's not loop over k either

# ret3 = np.zeros((M, K))

# for n in xrange(N): # slow way first

# ret3 += np.outer( Z[n], T[n] - Y[n] )

# assert(np.abs(ret1 - ret3).sum() < 0.00001)

# fastest - let's not loop over anything

ret4 = Z.T.dot(T - Y)

# assert(np.abs(ret1 - ret4).sum() < 0.00001)

return ret4

defderivative_w1(X, Z, T, Y, W2):

N, D = X.shape

M, K = W2.shape

# slow way first

# ret1 = np.zeros((X.shape[1], M))

# for n in xrange(N):

# for k in xrange(K):

# for m in xrange(M):

# for d in xrange(D):

# ret1[d,m] += (T[n,k] - Y[n,k])*W2[m,k]*Z[n,m]*(1 - Z[n,m])*X[n,d]

# fastest

dZ = (T - Y).dot(W2.T) * Z * (1- Z)

ret2 = X.T.dot(dZ)

# assert(np.abs(ret1 - ret2).sum() < 0.00001)

return ret2

defderivative_b2(T, Y):

return (T - Y).sum(axis=0)

defderivative_b1(T, Y, W2, Z):

return ((T - Y).dot(W2.T) * Z * (1- Z)).sum(axis=0)

defcost(T, Y):

tot = T * np.log(Y)

return tot.sum()

defmain():

# create the data

Nclass =500

D =2# dimensionality of input

M =3# hidden layer size

K =3# number of classes

X1 = np.random.randn(Nclass, D) + np.array([0, -2])

X2 = np.random.randn(Nclass, D) + np.array([2, 2])

X3 = np.random.randn(Nclass, D) + np.array([-2, 2])

X = np.vstack([X1, X2, X3])

Y = np.array([0]*Nclass + [1]*Nclass + [2]*Nclass)

N =len(Y)

# turn Y into an indicator matrix for training

T = np.zeros((N, K))

for i inrange(N):

T[i, Y[i]] =1

# let's see what it looks like

plt.scatter(X[:,0], X[:,1], c=Y, s=100, alpha=0.5)

plt.show()

# randomly initialize weights

W1 = np.random.randn(D, M)

b1 = np.random.randn(M)

W2 = np.random.randn(M, K)

b2 = np.random.randn(K)

learning_rate =1e-3

costs = []

for epoch inrange(1000):

output, hidden = forward(X, W1, b1, W2, b2)

if epoch %1000:

c = cost(T, output)

P = np.argmax(output, axis=1)

r = classification_rate(Y, P)

print('cost:', c, 'classification_rate:', r)

costs.append(c)

# this is gradient ASCENT, not DESCENT

# be comfortable with both!

# oldW2 = W2.copy()

W2 += learning_rate * derivative_w2(hidden, T, output)

b2 += learning_rate * derivative_b2(T, output)

W1 += learning_rate * derivative_w1(X, hidden, T, output, W2)

b1 += learning_rate * derivative_b1(T, output, W2, hidden)

plt.plot(costs)

plt.show()

if__name__'__main__':

main()

Copy lines
Copy permalink