Weight in Neural Network

In the context of a neural network, weights control the strength of the connections between neurons, and they are the primary parameters that the network adjusts during training to minimize error and learn patterns in the data. Weights in backpropagation are incredibly important. They are one of the core components that make a neural network work effectively.

1. Introduction

In a neural network, each neuron in one layer is connected to the neurons in the next layer via weights. A weight determines how much influence an input (or the output from the previous layer) has on the neuron in the current layer. For each input or signal going into a neuron, there’s a corresponding weight.

For example, in a simple neuron, the output is calculated as:

\[ z = w_1 \cdot x_1 + w_2 \cdot x_2 + \dots + w_n \cdot x_n + b \]

$x_1, x_2,…, x_n$ are the input values.
are the weights.
$b$ is the bias term.
$z$ is the weighted sum of the inputs, which is passed through an activation function to produce the neuron’s output.

The primary role of weights is to determine how much each input contributes to the neuron’s final output. In a network with multiple layers, the weights between layers define how information flows through the network, and they are responsible for the network’s ability to recognize and generalize patterns in the data.

Weights capture the relationships between the inputs and the outputs. During training, the backpropagation algorithm adjusts the weights so that the neural network can improve its predictions by minimizing the error.

2. In Backpropagation

Backpropagation is the algorithm used to update the weights in a neural network. During backpropagation, the error (or loss) is propagated backward through the network. The goal is to adjust the weights so that the error is minimized. The weights are updated by computing the gradient of the loss function with respect to each weight. The gradient tells us how much the loss would change if we changed that specific weight.

\[ \frac{\partial L}{\partial w_i} \]

$$L$$ is the loss function.
$w_i$ is the weight.

The weights are updated by subtracting the gradient multiplied by the learning rate $\eta$ (a small constant that controls how large the weight updates are):

\[ w_i^{\text{new}} = w_i^{\text{old}} – \eta \cdot \frac{\partial L}{\partial w_i} \]

You can read more about Gradient Descent

\[ \Rightarrow \Delta w_i = -\eta \cdot \frac{\partial L}{\partial w_i} \]

The negative sign ensures that you move in the direction that decreases the loss.

If $ \frac{\partial L}{\partial w_i} > 0 $ then $ – \eta \frac{\partial L}{\partial w_i} < 0 $, meaning $w_i$ will decrease.
If $ \frac{\partial L}{\partial w_i} < 0 $ then $ – \eta \frac{\partial L}{\partial w_i} > 0 $, meaning $w_i$ will increase.

To compute $\frac{\partial L}{\partial w_i}$, we use the chain rule, because the loss $L$ depends on the output of the neuron, which in turn depends on the input and weight.

\[ \frac{\partial L}{\partial w_i} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w_i} ~~~~~ (2.1) \]

$\frac{\partial L}{\partial a}$ is the derivative of the loss function $L$ with respect to the output of the neuron $a$.
$\frac{\partial a}{\partial z}$ is the derivative of the neuron’s output $a$ with respect to its pre-activation value $z$ (based on the activation function).
$\frac{\partial z}{\partial w}$ is the derivative of the pre-activation value $z$ with respect to the weight $w_i$. Since $ z = w_1 x_1 + w_2 x_2 + … + w_n x_n + b $ the derivative of $z$ with respect to $w_i$ is simply $x_i$, the input to that neuron.

$\frac{\partial L}{\partial a} \frac{\partial a}{\partial z}$ is commonly referred to as the error signal $\delta$ for the neuron. It represents the total of the loss to the neuron’s output and how the output depends on the pre-activation input. This error will change based on the neuron’s internal processing. So

\[ \delta = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \]

(2.1) can be transformed to:

\[ \frac{\partial L}{\partial w_i} = \delta \cdot x_i \]

\[ \Rightarrow \Delta w_i = -\eta \cdot \delta \cdot x_i \]