# Decision Trees

Mar. 30, 2023

• Classification
• Regression

## Regression

• Multi-output classification
• Multi-output regression

## Gini Impurity

$G = 1 - \sum_{k=1}^{n} p_{k}^{2}$

• The higher $G$, the more impure
• Child nodes generally possess a lower Gini impurity than their parent node does, but not always.

## Gini Impurity and Convexity

$g: (p_{1}, \ldots, p_{n}) \mapsto 1 - \sum_{k=1}^{n} p_{k}^{2}$

• The domain of $g$ is $\; p_k \ge 0 \;\forall\; k, \sum_{k=1}^{n} p_{k} = 1$
• $g$ is a concave function
• $p_{k} = 1$ for some $k \implies$ $g(p) = 0$, i.e. pure
• $g(\frac{1}{n},\ldots,\frac{1}{n}) = 1 - \frac{1}{n}$ gives the maximal value

## Entropy

Entropy and Gini impurity are similar concepts that are interchangeable when it comes to using Decision Trees.

## Regression

1. No need of feature scaling
2. Well-written libraries, e.g. scikit-learn, could do multi-output problems as easily as single-output ones

1. Sensitive to data variation, e.g. rotation, removing several instances
2. Decision boundaries must be perpendicular to some axis $\implies$ No good for extrapolation
3. Try to train decision trees on XOR, parity or multiplexer problems
4. Biased trees if some classes dominate. Recommended to balance the dataset prior to fitting with the decision tree

## Traits

1. Nonparametric model
2. Regularization needed, like many other ML algorithms

## Complexity

• Inference: $O(\lg(m))$
• Training: $O(nm\lg(m))$

## Less Covered in Previous Slides

1. Pruning
2. Random Forest

Thank you