Classical ML Algorithm Legends¶

Table of Contents¶

Decision Trees
Indexing
Other useful links

DecisionTrees¶

prerequisites

Information Gain: In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information. It was proposed to reduce a bias towards multi-valued attributes by taking the number and size of branches into account when choosing an attribute. Check this tutorial for more information on this topic.
Decision Trees

Generally there are two types of decision trees:

Classification tree : A Classification tree labels, records, and assigns variables to discrete classes.

0*ToYXqRes95eMvIKV.png

Regression tree: A regression tree is built through a process known as binary recursive partitioning, which is an iterative process that splits the data into partitions or branches, and then continues splitting each partition into smaller groups as the method moves up each branch.

Learning¶

Decision Tree Learning is a type of Supervised Machine Learning where the data is continuously split according to a certain parameter.

you can check out this wikipedia page for more thorough details.

Algorithms¶

Iterative Dichotomiser 3 (ID3) - (sample implementation)
C4.5 (Successor of ID3) - (sample implementation)
Classification And Regression Tree - (useful article)
Chi-square automatic interaction detection - (useful article)
Multivariate adaptive regression spline (useful tutorial)

Avoid Overfitting¶

Decision trees are prone to overfitting, especially when a tree is particularly deep. This is due to the amount of specificity we look at leading to smaller sample of events that meet the previous assumptions. This small sample could lead to unsound conclusions. In decision trees, pruning is a process which is applied to control or limit the depth (size) of the trees. By default, decision tree model hyperparameters were created to grow the tree into its full depth. These trees are called fully-grown trees which are always overfitting.

Pruning¶

Pruning reduces the size of decision trees by removing parts of the tree that do not provide power to classify instances. Check this Wikipedia link for full explanation.

Pre-pruning: As the names suggest, pre-pruning or early stopping involves stopping the tree before it has completed classifying the training set and post-pruning refers to pruning the tree after it has finished.

Post-pruning: Post-pruning a decision tree implies that we begin by generating the (complete) tree and then adjust it with the aim of improving the accuracy on unseen instances.

Also see:

Algorithms

Reduced error pruning: In this algorithm, by starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed.
Cost complexity pruning: This generates a series of trees and at each step a tree is made from the previous one by subtracting a subtree from it and replacing it with a leaf node with value chosen as in the tree building algorithm

Some techniques, often called ensemble methods, construct more than one decision tree:

Boosted trees
Rotation forest
Bootstrap aggregated

ensemble methods: In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

Boosted trees: Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. These can be used for regression-type and classification-type problems.
Rotation forest: in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features.
Bootstrap aggregated: bagged decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction.

Bagging, also known as bootstrap aggregation, is the ensemble learning method that is commonly used to reduce variance within a noisy dataset.

Here is the link of bagging topic in ML course at Cornell and this is a simple video explaining Bootstrap aggregating bagging.

The term 'Boosting') refers to a family of algorithms which converts weak learner to strong learners.

This video is A Short Introduction to Boosting. You can read this Medium article for a better perspective on the topic.

There are many boosting algorithms. The original ones are as followed:

a recursive majority gate formulation
boost by majority

check this paper for more information on this topic.

But these algorithms were not adaptive and could not take full advantage of the weak learners. Later, AdaBoost was developed, which was an adaptive boosting algorithm.

AdaBoost), short for Adaptive Boosting, is a machine learning meta-algorithm. It can be used in conjunction with many other types of learning algorithms to improve performance. This video explains the algorithm clearly.

Indexing¶

This link explains indexing in machine learning. You can also use this tutorial to see how it works and what it does. Also check this paper out for more information.

Finally...¶

Check these links out for more information on topics covered in this notebook.

Machine Learning - Decision Trees

Search Results: