mlatoz

ANN Intuition

Plan Of Attack

What we will learn in this section:

The Neuron
The Activation Function
How do Neural Networks work? (example)
How do Neural Networks learn?
Gradient Descent
Stochastic Gradient Descent
Backpropagation
ANN Algorithm Steps

The Neuron

A neuron is a specialized cell that can transmit and receive electrical and chemical signals in the nervous system.
Neurons have three main parts: a cell body, dendrites, and an axon.
Neurons communicate with each other through synapses, which are junctions where neurotransmitters are released.

The Neuron

Additional Reading

Efficient BackProp

By Yann LeCun et al. (1998)

Efficient BackProp

Efficient BackProp - Yann LeCun

The Activation Function

An activation function is a function that calculates the output of a node in an artificial neural network, based on its inputs and the weights on individual inputs.
Activation functions are essential for neural networks to learn complex patterns in data, as they introduce non-linearity and enable the network to approximate any function.
There are four types of Activation Functions.
The first one is Threshold Function

Threshold Function

The second one is Sigmoid Function

Sigmoid Function

where
x = Sum of weights

The third one is Rectifier Function

Rectifier Function

The fourth and last one is Hyperbolic Tangent (tanh) Function

Hyperbolic Tangent Function

Additional Reading

Deep Sparse Rectifier Neural Networks

By Xavier Glorot et al. (2011)

Deep Sparse Rectifier Neural Networks

Deep Sparse Rectifier Neural Networks - Xavier Glorot

How do Neural Networks Learn?

They receive input data and pass it through one or more layers of neurons, each with a non-linear activation function.
They produce output data and compare it with the expected or desired output, using a loss function to measure the error or discrepancy.
They adjust the weights and biases of the connections between neurons, using a learning algorithm such as gradient descent and a technique called backpropagation, to minimize the loss function and reduce the error.
They repeat this process for many iterations or epochs, until the network converges to a satisfactory level of performance.
The Cost Function:

c = ∑ 1/2 (y^ - y)²

where
y^ = y-cap

Additional Reading

A List of Cost Functions used in Neural Networks, Alongside Applications

CrossValidated (2015)

A List of Cost Functions used in Neural Networks

A List of Cost Functions used in Neural Networks, Alongside Applications - CrossValidated

Gradient Descent

Gradient Descent is an optimization algorithm that tries to find the minimum value of a function by iteratively moving in the direction of the steepest descent, which is the opposite of the gradient of the function.
Gradient Descent is the simplest optimization algorithm which computes gradients of loss function with respect to model weights and updates them by using the following formula:

w_t = w_t-1 - a . dw_t

where
w = Weight Vector
dw = Gradient of w
a = Learning Rate
t = Iteration Number

Stochastic Gradient Descent

Stochastic Gradient Descent is an iterative optimization algorithm commonly used in machine learning to find the optimal parameters (weights and biases) of a model that minimize a given loss function.
It’s a varient of gradient descent that approximates the true gradient of the loss function by randomly selecting a single data point (or a small batch of data points) at each iteration, rather than using the entire dataset.

The update rule for SGD is as follows:

w(t+1) = w(t) - η * ∇J(w(t), x(i), y(i))

where
w(t) represents the model parameters (weights and biases)
at iteration t.

η is the learning rate (a hyperparameter that controls the step size).

∇J(w(t), x(i), y(i)) is the gradient of the loss function J,
evaluated at the current parameters w(t) using a randomly selected
data point (x(i), y(i)).

Additional Reading

A Neural Network in 11 Lines of Python (Part 1)

Andrew Trask (2015)

A Neural Network in 11 Lines of Python

A Neural Network in 11 Lines of Python - Andrew Trask

Additional Reading

A Neural Network in 13 Lines of Python (Part 2 - Gradient Descent)

Andrew Trask (2015)

A Neural Network in 13 Lines of Python

A Neural Network in 13 Lines of Python - Andrew Trask

Backpropagation

Backpropagation, short for “backward propagation of errors”, is a fundamental algorithm in machine learning used for training artificial neural networks.
It calculates the gradient of the loss function with respect to the network’s weights, enabling the use of optimization algorithms like SGD to update the weights and minimize the loss.

Additional Reading

Neural Networks and Deep Learning

Michael Nielson (2015)

A Neural Network in 13 Lines of Python

Neural Networks and Deep Learning - Michael Nielson

Training the ANN with Stochastic Gradient Descent

STEP 1: Randomly initialize the weights to small numbers close to 0 (but not 0)
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Backward-Propagation: from right to left, the error is back-propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.

«Previous