What we will learn in this section:
A neuron is a specialized cell that can transmit and receive electrical and chemical signals in the nervous system.
Neurons have three main parts: a cell body, dendrites, and an axon.
Neurons communicate with each other through synapses, which are junctions where neurotransmitters are released.

By Yann LeCun et al. (1998)

An activation function is a function that calculates the output of a node in an artificial neural network, based on its inputs and the weights on individual inputs.
Activation functions are essential for neural networks to learn complex patterns in data, as they introduce non-linearity and enable the network to approximate any function.
There are four types of Activation Functions.
The first one is Threshold Function


where
x = Sum of weights


By Xavier Glorot et al. (2011)

They receive input data and pass it through one or more layers of neurons, each with a non-linear activation function.
They produce output data and compare it with the expected or desired output, using a loss function to measure the error or discrepancy.
They adjust the weights and biases of the connections between neurons, using a learning algorithm such as gradient descent and a technique called backpropagation, to minimize the loss function and reduce the error.
They repeat this process for many iterations or epochs, until the network converges to a satisfactory level of performance.
The Cost Function:
c = ∑ 1/2 (y^ - y)2
where
y^ = y-cap
CrossValidated (2015)

Gradient Descent is an optimization algorithm that tries to find the minimum value of a function by iteratively moving in the direction of the steepest descent, which is the opposite of the gradient of the function.
Gradient Descent is the simplest optimization algorithm which computes gradients of loss function with respect to model weights and updates them by using the following formula:
wt = wt-1 - a . dwt
where
w = Weight Vector
dw = Gradient of w
a = Learning Rate
t = Iteration Number
Stochastic Gradient Descent is an iterative optimization algorithm commonly used in machine learning to find the optimal parameters (weights and biases) of a model that minimize a given loss function.
It’s a varient of gradient descent that approximates the true gradient of the loss function by randomly selecting a single data point (or a small batch of data points) at each iteration, rather than using the entire dataset.
The update rule for SGD is as follows:
w(t+1) = w(t) - η * ∇J(w(t), x(i), y(i))
where
w(t) represents the model parameters (weights and biases)
at iteration t.
η is the learning rate (a hyperparameter that controls the step size).
∇J(w(t), x(i), y(i)) is the gradient of the loss function J,
evaluated at the current parameters w(t) using a randomly selected
data point (x(i), y(i)).
Andrew Trask (2015)

Andrew Trask (2015)

Backpropagation, short for “backward propagation of errors”, is a fundamental algorithm in machine learning used for training artificial neural networks.
It calculates the gradient of the loss function with respect to the network’s weights, enabling the use of optimization algorithms like SGD to update the weights and minimize the loss.
Michael Nielson (2015)

STEP 1: Randomly initialize the weights to small numbers close to 0 (but not 0)
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Backward-Propagation: from right to left, the error is back-propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.
| «Previous |