Gradient Boosting & XGBoost - Python Data Science Path

Sequential Learning at Its Best! 🚀

Gradient Boosting builds powerful predictive models by sequentially combining weak learners, with each new model correcting the errors of the previous ones. XGBoost, the extreme gradient boosting implementation, has dominated machine learning competitions with its speed and performance. Master these algorithms that consistently deliver state-of-the-art results across diverse problems.

Gradient Boosting Framework

graph TD A[Gradient Boosting] --> B[Base Learners] A --> C[Loss Functions] A --> D[Boosting Process] B --> E[Decision Trees] B --> F[Linear Models] C --> G[Regression Loss] C --> H[Classification Loss] C --> I[Custom Loss] D --> J[Sequential Training] D --> K[Residual Fitting] D --> L[Gradient Descent] J --> M[Tree 1] M --> N[Tree 2] N --> O[Tree n] O --> P[Weighted Sum] P --> Q[Final Prediction] style A fill:#f9f,stroke:#333,stroke-width:2px style D fill:#bbf,stroke:#333,stroke-width:2px style Q fill:#9f9,stroke:#333,stroke-width:2px

Understanding Gradient Boosting

Core Concepts and Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score
from sklearn.datasets import make_classification, make_regression, load_breast_cancer
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Generate datasets
np.random.seed(42)

# Classification data
X_class, y_class = make_classification(
    n_samples=1000, n_features=20, n_informative=15,
    n_redundant=5, n_clusters_per_class=2,
    random_state=42
)

# Regression data
X_reg, y_reg = make_regression(
    n_samples=1000, n_features=20, n_informative=15,
    noise=0.1, random_state=42
)

class GradientBoostingAnalyzer:
    """Comprehensive Gradient Boosting Analysis"""
    
    def __init__(self):
        self.models = {}
        self.histories = {}
        self.predictions = {}
        
    def manual_gradient_boosting(self, X, y, n_estimators=10, learning_rate=0.1, max_depth=3):
        """
        Simplified manual implementation of gradient boosting
        """
        from sklearn.tree import DecisionTreeRegressor
        
        print("Manual Gradient Boosting Implementation")
        print("="*50)
        
        # Initialize predictions with mean
        F = np.full(len(y), np.mean(y))
        
        # Store trees and training progress
        trees = []
        train_scores = []
        
        for i in range(n_estimators):
            # Calculate residuals (negative gradient)
            residuals = y - F
            
            # Fit a tree to residuals
            tree = DecisionTreeRegressor(max_depth=max_depth, random_state=i)
            tree.fit(X, residuals)
            
            # Make predictions
            predictions = tree.predict(X)
            
            # Update F
            F = F + learning_rate * predictions
            
            # Store tree
            trees.append(tree)
            
            # Calculate loss
            mse = mean_squared_error(y, F)
            train_scores.append(mse)
            
            if i % 20 == 0:
                print(f"Iteration {i}: MSE = {mse:.4f}")
        
        return trees, train_scores, F
    
    def visualize_boosting_process(self, X, y, n_estimators=50):
        """Visualize how boosting improves predictions iteratively"""
        
        # Use 1D data for visualization
        X_1d = X[:, 0].reshape(-1, 1)
        
        # Train gradient boosting with different numbers of estimators
        n_estimators_list = [1, 5, 10, 20, 50]
        
        fig, axes = plt.subplots(2, 3, figsize=(15, 10))
        axes = axes.flatten()
        
        for idx, n_est in enumerate(n_estimators_list):
            gb = GradientBoostingRegressor(
                n_estimators=n_est,
                learning_rate=0.1,
                max_depth=3,
                random_state=42
            )
            gb.fit(X_1d, y)
            
            # Sort for plotting
            sort_idx = X_1d.flatten().argsort()
            X_sorted = X_1d[sort_idx]
            y_sorted = y[sort_idx]
            y_pred_sorted = gb.predict(X_sorted)
            
            # Plot
            axes[idx].scatter(X_sorted, y_sorted, alpha=0.5, s=10, label='Data')
            axes[idx].plot(X_sorted, y_pred_sorted, 'r-', linewidth=2, 
                          label=f'GB ({n_est} trees)')
            axes[idx].set_xlabel('Feature')
            axes[idx].set_ylabel('Target')
            axes[idx].set_title(f'n_estimators = {n_est}')
            axes[idx].legend()
            axes[idx].grid(True, alpha=0.3)
            
            # Add R² score
            r2 = r2_score(y, gb.predict(X_1d))
            axes[idx].text(0.05, 0.95, f'R² = {r2:.3f}', 
                          transform=axes[idx].transAxes,
                          fontsize=11, verticalalignment='top',
                          bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
        
        # Remove extra subplot
        fig.delaxes(axes[5])
        
        plt.suptitle('Gradient Boosting: Effect of Number of Trees', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
    
    def compare_boosting_algorithms(self, X, y, task='classification'):
        """Compare different boosting algorithms"""
        
        from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor
        from sklearn.ensemble import HistGradientBoostingClassifier, HistGradientBoostingRegressor
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.3, random_state=42
        )
        
        if task == 'classification':
            models = {
                'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
                'AdaBoost': AdaBoostClassifier(n_estimators=100, random_state=42),
                'Hist Gradient Boosting': HistGradientBoostingClassifier(max_iter=100, random_state=42)
            }
            metric = 'accuracy'
        else:
            models = {
                'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, random_state=42),
                'AdaBoost': AdaBoostRegressor(n_estimators=100, random_state=42),
                'Hist Gradient Boosting': HistGradientBoostingRegressor(max_iter=100, random_state=42)
            }
            metric = 'r2'
        
        results = {}
        
        for name, model in models.items():
            # Train
            model.fit(X_train, y_train)
            
            # Predict
            y_pred_train = model.predict(X_train)
            y_pred_test = model.predict(X_test)
            
            # Evaluate
            if task == 'classification':
                train_score = accuracy_score(y_train, y_pred_train)
                test_score = accuracy_score(y_test, y_pred_test)
            else:
                train_score = r2_score(y_train, y_pred_train)
                test_score = r2_score(y_test, y_pred_test)
            
            results[name] = {
                'train_score': train_score,
                'test_score': test_score,
                'model': model
            }
            
            self.models[name] = model
        
        return results
    
    def learning_curves_analysis(self, X, y, gb_model=None):
        """Analyze learning curves and convergence"""
        
        if gb_model is None:
            gb_model = GradientBoostingClassifier(
                n_estimators=200,
                learning_rate=0.1,
                max_depth=3,
                subsample=0.8,
                random_state=42
            )
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.3, random_state=42
        )
        
        # Fit model
        gb_model.fit(X_train, y_train)
        
        # Get staged predictions for learning curves
        train_scores = []
        test_scores = []
        
        for pred_train, pred_test in zip(
            gb_model.staged_predict(X_train),
            gb_model.staged_predict(X_test)
        ):
            if hasattr(gb_model, 'classes_'):  # Classification
                train_scores.append(accuracy_score(y_train, pred_train))
                test_scores.append(accuracy_score(y_test, pred_test))
            else:  # Regression
                train_scores.append(r2_score(y_train, pred_train))
                test_scores.append(r2_score(y_test, pred_test))
        
        # Visualization
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Learning curves
        axes[0].plot(train_scores, label='Training Score', linewidth=2)
        axes[0].plot(test_scores, label='Validation Score', linewidth=2)
        axes[0].set_xlabel('Boosting Iterations')
        axes[0].set_ylabel('Score')
        axes[0].set_title('Learning Curves')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Find best iteration
        best_iter = np.argmax(test_scores)
        axes[0].axvline(x=best_iter, color='red', linestyle='--', 
                       label=f'Best iteration: {best_iter}')
        axes[0].legend()
        
        # Loss reduction
        if hasattr(gb_model, 'train_score_'):
            axes[1].plot(gb_model.train_score_, label='Training Loss', linewidth=2)
            axes[1].set_xlabel('Boosting Iterations')
            axes[1].set_ylabel('Loss')
            axes[1].set_title('Training Loss Reduction')
            axes[1].legend()
            axes[1].grid(True, alpha=0.3)
        
        plt.suptitle('Gradient Boosting Convergence Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return train_scores, test_scores, best_iter
    
    def feature_importance_analysis(self, model, feature_names=None):
        """Analyze and visualize feature importance"""
        
        if feature_names is None:
            feature_names = [f'Feature_{i}' for i in range(len(model.feature_importances_))]
        
        # Get feature importances
        importances = model.feature_importances_
        indices = np.argsort(importances)[::-1]
        
        # Select top features
        top_n = min(20, len(importances))
        top_indices = indices[:top_n]
        top_features = [feature_names[i] for i in top_indices]
        top_importances = importances[top_indices]
        
        # Create DataFrame
        importance_df = pd.DataFrame({
            'feature': feature_names,
            'importance': importances
        }).sort_values('importance', ascending=False)
        
        # Visualization
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Bar plot
        axes[0].barh(range(top_n), top_importances)
        axes[0].set_yticks(range(top_n))
        axes[0].set_yticklabels(top_features)
        axes[0].set_xlabel('Feature Importance')
        axes[0].set_title(f'Top {top_n} Features')
        axes[0].grid(True, alpha=0.3)
        
        # Cumulative importance
        cumsum = np.cumsum(importance_df['importance'].values)
        axes[1].plot(cumsum / cumsum[-1], linewidth=2)
        axes[1].axhline(y=0.8, color='r', linestyle='--', label='80% importance')
        axes[1].axhline(y=0.95, color='g', linestyle='--', label='95% importance')
        axes[1].set_xlabel('Number of Features')
        axes[1].set_ylabel('Cumulative Importance')
        axes[1].set_title('Cumulative Feature Importance')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)
        
        # Find number of features for thresholds
        n_80 = np.argmax(cumsum / cumsum[-1] >= 0.8) + 1
        n_95 = np.argmax(cumsum / cumsum[-1] >= 0.95) + 1
        axes[1].text(n_80, 0.8, f'  {n_80} features', verticalalignment='center')
        axes[1].text(n_95, 0.95, f'  {n_95} features', verticalalignment='center')
        
        plt.suptitle('Feature Importance Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return importance_df

# Initialize analyzer
analyzer = GradientBoostingAnalyzer()

print("="*60)
print("GRADIENT BOOSTING FUNDAMENTALS")
print("="*60)

# Manual implementation
print("\nManual Gradient Boosting Implementation:")
trees, train_scores, predictions = analyzer.manual_gradient_boosting(
    X_reg[:100], y_reg[:100], n_estimators=50
)

# Plot training progress
plt.figure(figsize=(10, 5))
plt.plot(train_scores, linewidth=2)
plt.xlabel('Iteration')
plt.ylabel('MSE')
plt.title('Gradient Boosting Training Progress')
plt.grid(True, alpha=0.3)
plt.show()

# Visualize boosting process
print("\nVisualizing boosting process...")
analyzer.visualize_boosting_process(X_reg[:200], y_reg[:200])

# Compare boosting algorithms
print("\n" + "="*60)
print("COMPARING BOOSTING ALGORITHMS")
print("="*60)

results_class = analyzer.compare_boosting_algorithms(X_class, y_class, task='classification')
print("\nClassification Results:")
for name, metrics in results_class.items():
    print(f"{name:25} - Train: {metrics['train_score']:.4f}, Test: {metrics['test_score']:.4f}")

# Learning curves
print("\nAnalyzing learning curves...")
train_scores, test_scores, best_iter = analyzer.learning_curves_analysis(X_class, y_class)
print(f"Best iteration: {best_iter}")

# Feature importance
print("\nAnalyzing feature importance...")
gb_model = analyzer.models['Gradient Boosting']
importance_df = analyzer.feature_importance_analysis(gb_model)

XGBoost: Extreme Gradient Boosting

Advanced Implementation with XGBoost

import xgboost as xgb
from xgboost import XGBClassifier, XGBRegressor

class XGBoostAnalyzer:
    """Comprehensive XGBoost Analysis"""
    
    def __init__(self):
        self.models = {}
        self.cv_results = {}
        
    def xgboost_basics(self, X, y, task='classification'):
        """Basic XGBoost implementation"""
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.3, random_state=42
        )
        
        if task == 'classification':
            # XGBoost Classifier
            xgb_model = XGBClassifier(
                n_estimators=100,
                max_depth=6,
                learning_rate=0.3,
                objective='binary:logistic',
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42
            )
        else:
            # XGBoost Regressor
            xgb_model = XGBRegressor(
                n_estimators=100,
                max_depth=6,
                learning_rate=0.3,
                objective='reg:squarederror',
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42
            )
        
        # Train with evaluation
        eval_set = [(X_train, y_train), (X_test, y_test)]
        xgb_model.fit(
            X_train, y_train,
            eval_set=eval_set,
            eval_metric='logloss' if task == 'classification' else 'rmse',
            early_stopping_rounds=10,
            verbose=False
        )
        
        # Store model
        self.models['xgboost'] = xgb_model
        
        # Get predictions
        y_pred = xgb_model.predict(X_test)
        
        # Evaluate
        if task == 'classification':
            score = accuracy_score(y_test, y_pred)
            print(f"XGBoost Accuracy: {score:.4f}")
        else:
            score = r2_score(y_test, y_pred)
            print(f"XGBoost R² Score: {score:.4f}")
        
        return xgb_model, score
    
    def plot_training_history(self, xgb_model):
        """Plot XGBoost training history"""
        
        results = xgb_model.evals_result()
        
        plt.figure(figsize=(10, 6))
        
        # Plot training and validation loss
        epochs = len(results['validation_0']['logloss'] if 'logloss' in results['validation_0'] 
                     else results['validation_0']['rmse'])
        x_axis = range(0, epochs)
        
        for i, (dataset, metrics) in enumerate(results.items()):
            metric_name = list(metrics.keys())[0]
            plt.plot(x_axis, metrics[metric_name], 
                    label=f'{dataset} {metric_name}', linewidth=2)
        
        plt.xlabel('Boosting Round')
        plt.ylabel('Loss')
        plt.title('XGBoost Training History')
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        # Mark best iteration
        best_iteration = xgb_model.best_iteration
        plt.axvline(x=best_iteration, color='red', linestyle='--', 
                   label=f'Best iteration: {best_iteration}')
        plt.legend()
        plt.show()
    
    def xgboost_cv(self, X, y, task='classification'):
        """Cross-validation with XGBoost"""
        
        # Create DMatrix
        if task == 'classification':
            dtrain = xgb.DMatrix(X, label=y)
            params = {
                'objective': 'binary:logistic',
                'max_depth': 6,
                'learning_rate': 0.3,
                'subsample': 0.8,
                'colsample_bytree': 0.8,
                'seed': 42,
                'eval_metric': 'logloss'
            }
        else:
            dtrain = xgb.DMatrix(X, label=y)
            params = {
                'objective': 'reg:squarederror',
                'max_depth': 6,
                'learning_rate': 0.3,
                'subsample': 0.8,
                'colsample_bytree': 0.8,
                'seed': 42,
                'eval_metric': 'rmse'
            }
        
        # Perform cross-validation
        cv_results = xgb.cv(
            params,
            dtrain,
            num_boost_round=100,
            nfold=5,
            early_stopping_rounds=10,
            seed=42,
            verbose_eval=False
        )
        
        self.cv_results = cv_results
        
        # Plot CV results
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Training and validation scores
        metric = list(cv_results.columns)[0].split('-')[1]
        
        axes[0].plot(cv_results[f'train-{metric}-mean'], label='Train', linewidth=2)
        axes[0].fill_between(range(len(cv_results)),
                            cv_results[f'train-{metric}-mean'] - cv_results[f'train-{metric}-std'],
                            cv_results[f'train-{metric}-mean'] + cv_results[f'train-{metric}-std'],
                            alpha=0.3)
        
        axes[0].plot(cv_results[f'test-{metric}-mean'], label='Validation', linewidth=2)
        axes[0].fill_between(range(len(cv_results)),
                            cv_results[f'test-{metric}-mean'] - cv_results[f'test-{metric}-std'],
                            cv_results[f'test-{metric}-mean'] + cv_results[f'test-{metric}-std'],
                            alpha=0.3)
        
        axes[0].set_xlabel('Boosting Round')
        axes[0].set_ylabel(metric.upper())
        axes[0].set_title('Cross-Validation Scores')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Standard deviation
        axes[1].plot(cv_results[f'train-{metric}-std'], label='Train Std', linewidth=2)
        axes[1].plot(cv_results[f'test-{metric}-std'], label='Validation Std', linewidth=2)
        axes[1].set_xlabel('Boosting Round')
        axes[1].set_ylabel('Standard Deviation')
        axes[1].set_title('Score Variability')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)
        
        plt.suptitle('XGBoost Cross-Validation Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        # Best score
        best_score = cv_results[f'test-{metric}-mean'].min()
        best_round = cv_results[f'test-{metric}-mean'].argmin()
        print(f"Best CV Score: {best_score:.4f} at round {best_round}")
        
        return cv_results
    
    def xgboost_feature_importance(self, model):
        """Visualize XGBoost feature importance"""
        
        # Get different types of importance
        importance_types = ['weight', 'gain', 'cover']
        
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        
        for idx, importance_type in enumerate(importance_types):
            # Get importance scores
            importance = model.get_booster().get_score(importance_type=importance_type)
            
            if importance:
                # Sort and select top features
                importance_sorted = sorted(importance.items(), key=lambda x: x[1], reverse=True)
                top_n = min(15, len(importance_sorted))
                features = [x[0] for x in importance_sorted[:top_n]]
                scores = [x[1] for x in importance_sorted[:top_n]]
                
                # Plot
                axes[idx].barh(range(top_n), scores)
                axes[idx].set_yticks(range(top_n))
                axes[idx].set_yticklabels(features)
                axes[idx].set_xlabel(f'{importance_type.capitalize()} Score')
                axes[idx].set_title(f'Feature Importance: {importance_type.capitalize()}')
                axes[idx].grid(True, alpha=0.3)
        
        plt.suptitle('XGBoost Feature Importance Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
    
    def xgboost_tree_visualization(self, model, tree_index=0):
        """Visualize individual trees in XGBoost"""
        
        # Plot tree
        fig, ax = plt.subplots(figsize=(20, 10))
        xgb.plot_tree(model, num_trees=tree_index, ax=ax)
        ax.set_title(f'XGBoost Tree {tree_index}', fontsize=16)
        plt.show()
        
        # Plot importance
        fig, ax = plt.subplots(figsize=(10, 8))
        xgb.plot_importance(model, ax=ax, max_num_features=20)
        ax.set_title('XGBoost Feature Importance', fontsize=14)
        plt.tight_layout()
        plt.show()

# XGBoost Analysis
print("\n" + "="*60)
print("XGBOOST IMPLEMENTATION")
print("="*60)

xgb_analyzer = XGBoostAnalyzer()

# Basic XGBoost
print("\nTraining XGBoost model...")
# Binary classification for XGBoost
y_binary = (y_class > y_class.mean()).astype(int)
xgb_model, score = xgb_analyzer.xgboost_basics(X_class, y_binary, task='classification')

# Plot training history
print("\nPlotting training history...")
xgb_analyzer.plot_training_history(xgb_model)

# Cross-validation
print("\nPerforming cross-validation...")
cv_results = xgb_analyzer.xgboost_cv(X_class, y_binary, task='classification')

# Feature importance
print("\nAnalyzing feature importance...")
xgb_analyzer.xgboost_feature_importance(xgb_model)

# Tree visualization (first tree only)
print("\nVisualizing first tree...")
# Note: Tree visualization works best with smaller trees
xgb_small = XGBClassifier(n_estimators=3, max_depth=3, random_state=42)
xgb_small.fit(X_class[:100], y_binary[:100])
xgb_analyzer.xgboost_tree_visualization(xgb_small, tree_index=0)

Hyperparameter Tuning

Optimizing Gradient Boosting Models

class GBMTuner:
    """Hyperparameter tuning for Gradient Boosting Models"""
    
    def __init__(self):
        self.best_params = {}
        self.search_results = {}
        
    def parameter_importance_analysis(self, X, y, model_type='xgboost'):
        """Analyze importance of different parameters"""
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.3, random_state=42
        )
        
        # Parameters to analyze
        param_ranges = {
            'n_estimators': [50, 100, 200, 500],
            'max_depth': [3, 5, 7, 10],
            'learning_rate': [0.01, 0.05, 0.1, 0.3],
            'subsample': [0.6, 0.7, 0.8, 0.9, 1.0],
            'colsample_bytree': [0.6, 0.7, 0.8, 0.9, 1.0]
        }
        
        # Base parameters
        base_params = {
            'n_estimators': 100,
            'max_depth': 6,
            'learning_rate': 0.1,
            'subsample': 0.8,
            'colsample_bytree': 0.8,
            'random_state': 42
        }
        
        results = {}
        
        for param_name, param_values in param_ranges.items():
            scores = []
            
            for value in param_values:
                # Update parameter
                params = base_params.copy()
                params[param_name] = value
                
                # Create and train model
                if model_type == 'xgboost':
                    model = XGBClassifier(**params)
                else:
                    model = GradientBoostingClassifier(**params)
                
                model.fit(X_train, y_train)
                score = model.score(X_test, y_test)
                scores.append(score)
            
            results[param_name] = {
                'values': param_values,
                'scores': scores,
                'impact': max(scores) - min(scores)
            }
        
        # Visualization
        fig, axes = plt.subplots(2, 3, figsize=(15, 10))
        axes = axes.flatten()
        
        for idx, (param_name, data) in enumerate(results.items()):
            if idx < len(axes):
                axes[idx].plot(data['values'], data['scores'], 'o-', linewidth=2, markersize=8)
                axes[idx].set_xlabel(param_name)
                axes[idx].set_ylabel('Test Accuracy')
                axes[idx].set_title(f'{param_name}\nImpact: {data["impact"]:.3f}')
                axes[idx].grid(True, alpha=0.3)
                
                # Mark best value
                best_idx = np.argmax(data['scores'])
                axes[idx].axvline(x=data['values'][best_idx], color='red', 
                                 linestyle='--', alpha=0.5)
        
        # Remove extra subplot
        if len(results) < len(axes):
            fig.delaxes(axes[-1])
        
        plt.suptitle('Hyperparameter Sensitivity Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return results
    
    def grid_search_optimization(self, X, y, model_type='xgboost'):
        """Comprehensive grid search for optimal parameters"""
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )
        
        if model_type == 'xgboost':
            model = XGBClassifier(random_state=42)
            param_grid = {
                'n_estimators': [100, 200],
                'max_depth': [3, 5, 7],
                'learning_rate': [0.05, 0.1, 0.2],
                'subsample': [0.7, 0.8, 0.9],
                'colsample_bytree': [0.7, 0.8, 0.9]
            }
        else:
            model = GradientBoostingClassifier(random_state=42)
            param_grid = {
                'n_estimators': [100, 200],
                'max_depth': [3, 5, 7],
                'learning_rate': [0.05, 0.1, 0.2],
                'subsample': [0.7, 0.8, 0.9],
                'max_features': ['sqrt', 'log2', None]
            }
        
        # Grid search
        print(f"Performing grid search for {model_type}...")
        grid_search = GridSearchCV(
            model, param_grid,
            cv=5, scoring='accuracy',
            n_jobs=-1, verbose=1
        )
        
        grid_search.fit(X_train, y_train)
        
        # Store results
        self.best_params[model_type] = grid_search.best_params_
        self.search_results[model_type] = pd.DataFrame(grid_search.cv_results_)
        
        # Test performance
        y_pred = grid_search.best_estimator_.predict(X_test)
        test_score = accuracy_score(y_test, y_pred)
        
        print(f"\nBest Parameters: {grid_search.best_params_}")
        print(f"Best CV Score: {grid_search.best_score_:.4f}")
        print(f"Test Score: {test_score:.4f}")
        
        return grid_search.best_estimator_
    
    def early_stopping_analysis(self, X, y):
        """Analyze effect of early stopping"""
        
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.3, random_state=42
        )
        
        # Train with different early stopping rounds
        early_stopping_rounds = [None, 5, 10, 20, 50]
        
        results = []
        
        for early_stop in early_stopping_rounds:
            xgb_model = XGBClassifier(
                n_estimators=500,
                learning_rate=0.1,
                random_state=42
            )
            
            if early_stop:
                eval_set = [(X_test, y_test)]
                xgb_model.fit(
                    X_train, y_train,
                    eval_set=eval_set,
                    early_stopping_rounds=early_stop,
                    verbose=False
                )
                n_trees = xgb_model.best_ntree_limit
            else:
                xgb_model.fit(X_train, y_train)
                n_trees = 500
            
            # Evaluate
            y_pred = xgb_model.predict(X_test)
            score = accuracy_score(y_test, y_pred)
            
            results.append({
                'early_stopping': early_stop if early_stop else 'None',
                'n_trees': n_trees,
                'test_score': score
            })
        
        results_df = pd.DataFrame(results)
        
        # Visualization
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Number of trees
        axes[0].bar(range(len(results_df)), results_df['n_trees'])
        axes[0].set_xticks(range(len(results_df)))
        axes[0].set_xticklabels(results_df['early_stopping'])
        axes[0].set_xlabel('Early Stopping Rounds')
        axes[0].set_ylabel('Number of Trees Used')
        axes[0].set_title('Trees vs Early Stopping')
        axes[0].grid(True, alpha=0.3)
        
        # Test scores
        axes[1].bar(range(len(results_df)), results_df['test_score'])
        axes[1].set_xticks(range(len(results_df)))
        axes[1].set_xticklabels(results_df['early_stopping'])
        axes[1].set_xlabel('Early Stopping Rounds')
        axes[1].set_ylabel('Test Accuracy')
        axes[1].set_title('Performance vs Early Stopping')
        axes[1].grid(True, alpha=0.3)
        
        # Add value labels
        for i, (trees, score) in enumerate(zip(results_df['n_trees'], results_df['test_score'])):
            axes[0].text(i, trees, f'{trees}', ha='center', va='bottom')
            axes[1].text(i, score, f'{score:.3f}', ha='center', va='bottom')
        
        plt.suptitle('Early Stopping Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return results_df

# Hyperparameter tuning
print("\n" + "="*60)
print("HYPERPARAMETER OPTIMIZATION")
print("="*60)

tuner = GBMTuner()

# Parameter importance
print("\nAnalyzing parameter importance...")
param_results = tuner.parameter_importance_analysis(X_class, y_binary, model_type='xgboost')

# Grid search
print("\nPerforming grid search...")
best_model = tuner.grid_search_optimization(X_class[:500], y_binary[:500], model_type='xgboost')

# Early stopping analysis
print("\nAnalyzing early stopping...")
early_stop_results = tuner.early_stopping_analysis(X_class, y_binary)

Advanced Techniques

LightGBM and CatBoost

# Note: Install with: pip install lightgbm catboost

try:
    import lightgbm as lgb
    from lightgbm import LGBMClassifier, LGBMRegressor
    import catboost as cb
    from catboost import CatBoostClassifier, CatBoostRegressor
    
    def compare_advanced_gbm(X, y):
        """Compare XGBoost, LightGBM, and CatBoost"""
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.3, random_state=42
        )
        
        models = {
            'XGBoost': XGBClassifier(
                n_estimators=100, learning_rate=0.1,
                random_state=42, verbosity=0
            ),
            'LightGBM': LGBMClassifier(
                n_estimators=100, learning_rate=0.1,
                random_state=42, verbosity=-1
            ),
            'CatBoost': CatBoostClassifier(
                iterations=100, learning_rate=0.1,
                random_seed=42, verbose=False
            )
        }
        
        results = {}
        
        for name, model in models.items():
            # Train
            start_time = pd.Timestamp.now()
            model.fit(X_train, y_train)
            train_time = (pd.Timestamp.now() - start_time).total_seconds()
            
            # Predict
            start_time = pd.Timestamp.now()
            y_pred = model.predict(X_test)
            pred_time = (pd.Timestamp.now() - start_time).total_seconds()
            
            # Evaluate
            accuracy = accuracy_score(y_test, y_pred)
            
            results[name] = {
                'accuracy': accuracy,
                'train_time': train_time,
                'pred_time': pred_time
            }
        
        # Create comparison DataFrame
        comparison_df = pd.DataFrame(results).T
        
        # Visualization
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        
        # Accuracy
        axes[0].bar(comparison_df.index, comparison_df['accuracy'])
        axes[0].set_ylabel('Accuracy')
        axes[0].set_title('Model Accuracy')
        axes[0].grid(True, alpha=0.3)
        
        # Training time
        axes[1].bar(comparison_df.index, comparison_df['train_time'])
        axes[1].set_ylabel('Time (seconds)')
        axes[1].set_title('Training Time')
        axes[1].grid(True, alpha=0.3)
        
        # Prediction time
        axes[2].bar(comparison_df.index, comparison_df['pred_time'])
        axes[2].set_ylabel('Time (seconds)')
        axes[2].set_title('Prediction Time')
        axes[2].grid(True, alpha=0.3)
        
        plt.suptitle('Advanced GBM Comparison', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return comparison_df
    
    print("\n" + "="*60)
    print("ADVANCED GRADIENT BOOSTING COMPARISON")
    print("="*60)
    
    comparison = compare_advanced_gbm(X_class, y_binary)
    print("\nModel Comparison:")
    print(comparison.to_string())
    
except ImportError:
    print("\nLightGBM and/or CatBoost not installed.")
    print("Install with: pip install lightgbm catboost")

Best Practices and Guidelines

print("\n" + "="*60)
print("GRADIENT BOOSTING BEST PRACTICES")
print("="*60)

best_practices = """
KEY GUIDELINES:

1. PARAMETER TUNING PRIORITY:
   First: n_estimators (with early stopping)
   Second: max_depth (3-10 typically)
   Third: learning_rate (0.01-0.3)
   Fourth: subsample, colsample_bytree
   Fifth: regularization (reg_alpha, reg_lambda)

2. PREVENTING OVERFITTING:
   • Use early stopping with validation set
   • Limit tree depth (max_depth=3-7)
   • Use subsample < 1.0
   • Add regularization (L1/L2)
   • Use colsample_bytree < 1.0

3. HANDLING IMBALANCED DATA:
   • Use scale_pos_weight for binary classification
   • Adjust sample_weight for multi-class
   • Consider focal loss or custom objectives
   • Use stratified splits for validation

4. FEATURE ENGINEERING:
   • GBM can handle raw features well
   • Interaction features less important
   • Categorical encoding: Label or Target encoding
   • Missing values: XGBoost/LightGBM handle natively

5. COMPUTATIONAL EFFICIENCY:
   • Use histogram-based implementations (LightGBM)
   • Enable GPU acceleration if available
   • Use sparse matrices for high-dimensional data
   • Consider feature selection for very wide datasets

6. WHEN TO USE GRADIENT BOOSTING:
   ✓ Tabular/structured data
   ✓ Need high accuracy
   ✓ Feature interactions important
   ✓ Mixed data types
   ✓ Competition settings
   ✗ Real-time predictions needed
   ✗ Limited computational resources
   ✗ Need model interpretability

7. MODEL SELECTION:
   • XGBoost: General purpose, mature ecosystem
   • LightGBM: Faster training, less memory
   • CatBoost: Better with categorical features
   • Sklearn GBM: Simple, fewer dependencies
"""

print(best_practices)

# Parameter guidelines
param_guidelines = {
    'Parameter': ['n_estimators', 'max_depth', 'learning_rate', 'subsample', 
                 'colsample_bytree', 'reg_alpha', 'reg_lambda'],
    'Default': [100, 6, 0.3, 1.0, 1.0, 0, 0],
    'Typical Range': ['100-1000', '3-10', '0.01-0.3', '0.5-1.0', 
                     '0.5-1.0', '0-10', '0-10'],
    'Effect': ['More trees = better fit', 'Deeper = more complex',
              'Smaller = more robust', 'Prevent overfitting',
              'Feature sampling', 'L1 regularization', 'L2 regularization']
}

param_df = pd.DataFrame(param_guidelines)
print("\n" + "="*60)
print("PARAMETER GUIDELINES")
print("="*60)
print(param_df.to_string(index=False))

# Algorithm comparison
comparison_data = {
    'Algorithm': ['Random Forest', 'Gradient Boosting', 'XGBoost', 'LightGBM', 'CatBoost'],
    'Speed': ['Medium', 'Slow', 'Medium', 'Fast', 'Medium'],
    'Memory': ['High', 'Low', 'Medium', 'Low', 'Medium'],
    'Accuracy': ['High', 'Very High', 'Very High', 'Very High', 'Very High'],
    'Interpretability': ['Medium', 'Low', 'Low', 'Low', 'Low'],
    'Handles Missing': ['No', 'No', 'Yes', 'Yes', 'Yes'],
    'Categorical': ['Encoding needed', 'Encoding needed', 'Encoding needed', 
                   'Can handle', 'Native support']
}

comparison_df = pd.DataFrame(comparison_data)
print("\n" + "="*60)
print("GRADIENT BOOSTING VARIANTS COMPARISON")
print("="*60)
print(comparison_df.to_string(index=False))

Practice Exercises

Exercise 1: Custom Loss Functions

Implement gradient boosting with custom loss functions:

Create Huber loss for robust regression
Implement focal loss for imbalanced classification
Design quantile loss for interval predictions
Compare with standard loss functions

Exercise 2: Ensemble Stacking

Build stacked ensemble with gradient boosting:

Train multiple GBM models with different parameters
Use predictions as features for meta-learner
Implement cross-validated stacking
Compare with single models

Exercise 3: Time Series with GBM

Apply gradient boosting to time series:

Create lag features and rolling statistics
Implement walk-forward validation
Handle seasonality and trends
Build multi-step ahead forecasts

Key Takeaways

🎯 Gradient boosting builds models sequentially to correct errors
📈 Each tree learns from the mistakes of previous trees
⚡ XGBoost adds regularization and computational optimizations
🔄 Early stopping prevents overfitting automatically
🎚️ Learning rate controls contribution of each tree
🌲 Shallow trees (depth 3-7) often work best
💾 More memory efficient than Random Forests
🏆 Dominates machine learning competitions
📊 Excellent for tabular/structured data
⚠️ Prone to overfitting without proper tuning