Linear classification

Linear classification

Biological inspirations • Some numbers… • The human brain contains about 10 billion nerve cells (neurons) • Each neuron is connected to the others through 10000 synapses • Properties of the brain • It can learn, reorganize itself from experience • It adapts to the environment • It is robust and fault tolerant

Biological neuron (simplified model) • A neuron has • A branching input (dendrites) • A branching output (the axon) • The information circulates from the dendrites to the axon via the cell body • The cell body sums up the inputs in some way and fires– generates a signal through the axon – if the result is greater than some threshold

An Artificial Neuron • - weights • - inputs • Definition : Non linear, parameterized function with restricted output range Activation Function Usually not pictured (we’ll see why), but you can imagine a threshold parameter here.

Same Idea using the Notation in the Book

The Output of a Neuron • As described so far… This simplest form of a neuron is also called a perceptron.

The Output of a Neuron • Other possibilities, such as the sigmoid function for continuous output. • is the activation of the neuron • is a parameter which controls the shape of the curve (usually )

Linear Regression using a Perceptron • Linear regression: • Find a linear function (straight line) that best predicts the continuous-valued output.

Linear Regression As an Optimization Problem • Finding the optimal weights could be solved through: • Gradient descent • Simulated annealing • Genetic algorithms • … and now Neural Networks

Linear Regression using a Perceptron

The Bias Term • So far we have defined the output of a perceptron as controlled by a threshold x1w1+ x2w2 + x3w3… + xnwn >= t • But just like the weights, this threshold is a parameter that needs to be adjusted • Solution: make it another weight x1w1+ x2w2 + x3w3… + xnwn + (1)(-t)>= 0 The bias term.

A Neuron with a Bias Term

Another Example • Assign weights to perform the logical OR operation.

Artificial Neural Network (ANN) • A mathematical model to solve engineering problems • Group of highly connected neurons to realize compositions of non linear functions • Tasks • Classification • Discrimination • Estimation

The information is propagated from the inputs to the outputs There are no cycles between outputs and inputs the state of the system is not preserved from one iteration to another Feed Forward Neural Networks Output layer 2nd hidden layer 1st hidden layer x1 x2 ….. xn

ANN Structure • Finite number of inputs • Zero or more hidden layers • One or more outputs • All nodes at the hidden and output layers contain a bias term.

Examples • Handwriting character recognition • Control of a virtual agent

ALVINNNeural Network controlled AGV (1994) weights

http://blog.davidsingleton.org/nnrccar

Learning • The procedure that consists in estimating the weight parameters so that the whole network can perform a specific task • The Learning process (supervised) • Present the network a number of inputs and their corresponding outputs • See how closely the actual outputs match the desired ones • Modify the parameters to better approximate the desired outputs

Perceptron Learning Rule • Initialize the weights to some random values (or 0) • For each sample in the training set • Calculate the current output of the perceptron, • Update the weights • Repeat until the error is smaller than some predefined threshold is the learning rate, usually

Linear Separability • Perceptrons can classify any input that is linearly separable. • For more complex problems we need a more complex model.

A B B A B A A B B A B A A B B A B A Different Non-Linearly Separable Problems Types of Decision Regions Exclusive-OR Problem Classes with Meshed regions Most General Region Shapes Structure Single-Layer Half Plane Bounded By Hyperplane Two-Layer Convex Open Or Closed Regions Arbitrary (Complexity Limited by No. of Nodes) Three-Layer

Calculating the Weights • The weights are a vector of parameters where we need to find a global optimum • Could be solved by: • Simulated annealing • Gradient descent • Genetic algorithms • http://www.youtube.com/watch?v=0Str0Rdkxxo Perceptron learning rule is pretty much gradient descent.

Learning the Weights in a Neural Network • Perceptron learning rule (gradient descent) worked before, but it required us to know the correct output of the node. • How do we know the correct output of a given hidden node??

Backpropagation Algorithm • Gradient descent over entire network weight vector • Easily generalized to arbitrary directed graphs • Will find a local, not necessarily global error minimum • in practice often works well (can be invoked multiple times with different initial weights)

Backpropagation Algorithm • Initialize the weights to some random values (or 0) • For each sample in the training set • Calculate the current output of the node, • For each output node , update the weights • For each hidden node, update the weights • For all network weights do • Repeat until weights converge or desired accuracy is achieved

Intuition • General idea: hidden nodes are “responsible” for some of the error at the output nodes it connects to • The change in the hidden weights is proportional to the strength (magnitude) of the connection between the hidden node and the output node • This is the same as the perceptron learning rule, but for a sigmoid decision function instead of a step decision function (full derivation on p. 726)

Intuition • General idea: hidden nodes are “responsible” for some of the error at the output nodes it connects to • The change in the hidden weights is proportional to the strength (magnitude) of the connection between the hidden node and the output node

Intuition • When expanded, the update to the output nodes is almost the same as the perceptron rule • Slight difference is that the algorithm uses a sigmoid function instead of a step function • (full derivation on p. 726)

Questions

Linear classification