Decision Tree
Most decision tree learning algorithms follows this template with with a different choices of heuristics: DecisionTreeLearn(π·): Input a dataset π·, output a decision tree hypothesis Create a root node If termination conditions are met return a single node tree with leaf prediction based on π· Else: Greedily find a feature π΄ (assigned as root) to split according to split criteria For each possible value π£ of π΄ Let π·_i be the dataset containing data with value π£_i for feature π΄ Create a subtree DecisionTreeLearn(π·) that being the child of root TODO understand this template better
Split criteria
One split criteria is the Iterative Dichotomiser 3 (ID3) algorithm: TODO: discuss information entropy adn information gain to show algo
To Address Overfitting
β’ More Regularization (Constrain π») β’ Do not split leaves past a fixed depth β’ Do not split leaves with fewer than π labels β’ Do not split leaves where the maximal information gain is less than π β’ Pruning (removing leaves) β’ Evaluate each split using a validation set and compare the validation error with and without that split (replacing it with the most common label at that point) β’ Use statistical test to examine whether the split is βinformativeβ (leads to different enough subtrees)
