Iris Again
Gini Impurity
$G = 1 - \sum_{k=1}^{n} p_{k}^{2}$
- The higher $G$, the more impure
- Child nodes generally possess a lower Gini impurity than their parent node does, but not always.
Gini Impurity and Convexity
$g: (p_{1}, \ldots, p_{n}) \mapsto 1 - \sum_{k=1}^{n} p_{k}^{2}$
- The domain of $g$ is $\; p_k \ge 0 \;\forall\; k, \sum_{k=1}^{n} p_{k} = 1$
- $g$ is a concave function
- $p_{k} = 1$ for some $k \implies$ $g(p) = 0$, i.e. pure
- $g(\frac{1}{n},\ldots,\frac{1}{n}) = 1 - \frac{1}{n}$ gives the maximal value
Entropy
Entropy and Gini impurity are similar concepts that are interchangeable when it comes to using Decision Trees.