Decision Trees

Mar. 30, 2023

Usual Tasks

Classification
Regression

Classification

Regression

Less Usual Tasks

Multi-output classification
Multi-output regression

Multi-Output Regression

Classification

Iris Again

Gini Impurity

$G = 1 - \sum_{k=1}^{n} p_{k}^{2}$

The higher $G$, the more impure
Child nodes generally possess a lower Gini impurity than their parent node does, but not always.

Gini Impurity and Convexity

$g: (p_{1}, \ldots, p_{n}) \mapsto 1 - \sum_{k=1}^{n} p_{k}^{2}$

The domain of $g$ is $\; p_k \ge 0 \;\forall\; k, \sum_{k=1}^{n} p_{k} = 1$
$g$ is a concave function
$p_{k} = 1$ for some $k \implies$ $g(p) = 0$, i.e. pure
$g(\frac{1}{n},\ldots,\frac{1}{n}) = 1 - \frac{1}{n}$ gives the maximal value

Entropy

Entropy and Gini impurity are similar concepts that are interchangeable when it comes to using Decision Trees.

Regression

Advantages

No need of feature scaling
Well-written libraries, e.g. scikit-learn, could do multi-output problems as easily as single-output ones

Disadvantages

Sensitive to data variation, e.g. rotation, removing several instances
Decision boundaries must be perpendicular to some axis $\implies$ No good for extrapolation
Try to train decision trees on XOR, parity or multiplexer problems
Biased trees if some classes dominate. Recommended to balance the dataset prior to fitting with the decision tree

Traits

Nonparametric model
Regularization needed, like many other ML algorithms

Complexity

Inference: $O(\lg(m))$
Training: $O(nm\lg(m))$

Less Covered in Previous Slides

Pruning
Random Forest

References

Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow
https://scikit-learn.org/stable/modules/tree.html

Q&A

END

Thank you