Gradient Boosting
·
Learn a Regression predictor
· Compute the error residual
· Learn to predict the residual
· Combine the predictors
·
Learn a sequence of predictors
· Sum of predictions is more accurate
· Predictive function is increasingly complex
·
Num_of_trees/ Iterations (100 – 1000)
· Learning Rate/ Shrinkage (0.001 – 0.1)
· Min Number of samples in Terminal Node/ Leaf Node (3 -6)
· Depth of the tree (3 – 5)
· Subsampling (0.4 – 0.8)
· Gamma (0 – 1)
·
Usually gives better predictions than Random
Forests and individual Decision trees
· They can handle categorical features easily which is not the case with Logistic Regression
· They do not expect the features to be linear or even features that interact linearly
· They can handle large dimensional spaces and large number of samples
·
More prone to overfitting.
· We have to be extremely cautious while tuning the parameters (shrinkage, Num_trees, depth) i.e. they are harder to get right
· Need more computation time
Gradient boosting is
an ensemble techniques that gives you a collection of predictors (decision
trees). It is an iterative boosting technique in which a weak classifier is
made at the first iteration and then the consequent classifiers learns the residuals from the previous classifiers.
Mean squared error
function gradient is equal to (y- yhat)
which is nothing but the residuals from the first prediction and hence
iteratively it keeps on learning the residuals and gives a good estimate after
n steps using a low learning rate (or decaying).
Steps:
· Compute the error residual
· Learn to predict the residual
· Combine the predictors
Each Learner is estimating the gradient of the loss
function. Take a sequence of steps to reduce error function.
What does
Gradient Boost do?
· Sum of predictions is more accurate
· Predictive function is increasingly complex
Parameters:
· Learning Rate/ Shrinkage (0.001 – 0.1)
· Min Number of samples in Terminal Node/ Leaf Node (3 -6)
· Depth of the tree (3 – 5)
· Subsampling (0.4 – 0.8)
· Gamma (0 – 1)
Advantages:
· They can handle categorical features easily which is not the case with Logistic Regression
· They do not expect the features to be linear or even features that interact linearly
· They can handle large dimensional spaces and large number of samples
Disadvantages:
· We have to be extremely cautious while tuning the parameters (shrinkage, Num_trees, depth) i.e. they are harder to get right
· Need more computation time
Links:
I think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!!
ReplyDeleteBest Machine Learning institute in velachery | python machine learning course in velachery | Machine Learning course in chennai