Decision Trees

Wu Cheng-Fu

Mar. 30, 2023

Usual Tasks

  • Classification
  • Regression

Classification

Regression

Less Usual Tasks

  • Multi-output classification
  • Multi-output regression

Multi-Output Regression

Classification

Iris Again

Gini Impurity

$G = 1 - \sum_{k=1}^{n} p_{k}^{2}$

  • The higher $G$, the more impure
  • Child nodes generally possess a lower Gini impurity than their parent node does, but not always.

Gini Impurity and Convexity

$g: (p_{1}, \ldots, p_{n}) \mapsto 1 - \sum_{k=1}^{n} p_{k}^{2}$

  • The domain of $g$ is $\; p_k \ge 0 \;\forall\; k, \sum_{k=1}^{n} p_{k} = 1$
  • $g$ is a concave function
  • $p_{k} = 1$ for some $k \implies$ $g(p) = 0$, i.e. pure
  • $g(\frac{1}{n},\ldots,\frac{1}{n}) = 1 - \frac{1}{n}$ gives the maximal value

Entropy

Entropy and Gini impurity are similar concepts that are interchangeable when it comes to using Decision Trees.

Regression

Advantages

  1. No need of feature scaling
  2. Well-written libraries, e.g. scikit-learn, could do multi-output problems as easily as single-output ones

Disadvantages

  1. Sensitive to data variation, e.g. rotation, removing several instances
  2. Decision boundaries must be perpendicular to some axis $\implies$ No good for extrapolation
  3. Try to train decision trees on XOR, parity or multiplexer problems
  4. Biased trees if some classes dominate. Recommended to balance the dataset prior to fitting with the decision tree

Traits

  1. Nonparametric model
  2. Regularization needed, like many other ML algorithms

Complexity

  • Inference: $O(\lg(m))$
  • Training: $O(nm\lg(m))$

Less Covered in Previous Slides

  1. Pruning
  2. Random Forest

References

Q&A

END

Thank you