Machine Learning - Decision Trees
In this notebook, we will study decision trees and different types of them and we will discuss topics like pruning and indexing in ML.
prerequisites
Information Gain: In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information. It was proposed to reduce a bias towards multi-valued attributes by taking the number and size of branches into account when choosing an attribute. Check this tutorial for more information on this topic.
Generally there are two types of decision trees:
Decision Tree Learning is a type of Supervised Machine Learning where the data is continuously split according to a certain parameter.
you can check out this wikipedia page for more thorough details.
Decision trees are prone to overfitting, especially when a tree is particularly deep. This is due to the amount of specificity we look at leading to smaller sample of events that meet the previous assumptions. This small sample could lead to unsound conclusions. In decision trees, pruning is a process which is applied to control or limit the depth (size) of the trees. By default, decision tree model hyperparameters were created to grow the tree into its full depth. These trees are called fully-grown trees which are always overfitting.
Pruning reduces the size of decision trees by removing parts of the tree that do not provide power to classify instances. Check this Wikipedia link for full explanation.
Pre-pruning: As the names suggest, pre-pruning or early stopping involves stopping the tree before it has completed classifying the training set and post-pruning refers to pruning the tree after it has finished.
Post-pruning: Post-pruning a decision tree implies that we begin by generating the (complete) tree and then adjust it with the aim of improving the accuracy on unseen instances.
Also see:
Algorithms
Some techniques, often called ensemble methods, construct more than one decision tree:
ensemble methods: In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
Boosted trees: Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. These can be used for regression-type and classification-type problems.
Rotation forest: in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features.
Bootstrap aggregated: bagged decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction.
Bagging, also known as bootstrap aggregation, is the ensemble learning method that is commonly used to reduce variance within a noisy dataset.
The term 'Boosting') refers to a family of algorithms which converts weak learner to strong learners.
This video is A Short Introduction to Boosting. You can read this Medium article for a better perspective on the topic.
There are many boosting algorithms. The original ones are as followed:
check this paper for more information on this topic.
But these algorithms were not adaptive and could not take full advantage of the weak learners. Later, AdaBoost was developed, which was an adaptive boosting algorithm.
AdaBoost), short for Adaptive Boosting, is a machine learning meta-algorithm. It can be used in conjunction with many other types of learning algorithms to improve performance. This video explains the algorithm clearly.
This link explains indexing in machine learning. You can also use this tutorial to see how it works and what it does. Also check this paper out for more information.
Check these links out for more information on topics covered in this notebook.