Cross Entropy
Cross-entropy is a commonly used loss function in machine learning, particularly in classification tasks, to measure the difference between two probability distributions. It is used to quantify the error between the actual distribution of the labels and the predicted distribution by a model.
\[ H(p, q) = – \sum_{i=1}^{n} p(x_i) \log(q(x_i)) \]
- \(p(x_i)\): the true probability of class \(i\) (usually 0 or 1 in classification tasks)
- \(q(x_i)\): the predicted probability for class \(i\)
- \(n\): the number of classes
1. Basic sample
Suppose you’re predicting the outcome of a football match, and there are three possible outcomes:
- Team A wins
- Team B wins
- Draw
Let’s assume the true result is that Team A wins:
- Team A wins: 1
- Team B wins: 0
- Draw: 0
Now, suppose your model predicts the following probabilities for the outcomes:
- Team A wins: 0.6
- Team B wins: 0.3
- Draw: 0.1
The cross-entropy would be calculated as follows:
\[ H(p, q) = -(1 \cdot \log(0.6) + 0 \cdot \log(0.3) + 0 \cdot \log(0.1)) \]
\[ \Rightarrow H(p, q) = -\log(0.6) \]
\[ \Rightarrow H(p, q) \approx -(-0.2218) = 0.2218 \]
So the cross-entropy loss for this prediction would be approximately 0.2218
2. Logarithm Function
- At \(x = 1\) => \(\log(1) = 0\)
- As \(x\) decreases (closer to 0) => \(\log(x)\) becomes more negative
Based on the above properties, we use the logarithm in the cross-entropy formula is to penalize incorrect predictions more strongly.
- Logarithms turn small probabilities into large negative values, heavily penalizing wrong predictions.
- If the predicted probability of the true class is high (close to 1), the logarithm yields a value close to 0, meaning low loss and rewarding accurate predictions.
- The logarithmic function provides smooth, differentiable output, which helps optimization algorithms (like gradient descent) efficiently update model weights during training.