🎯 Naive Bayes: Probabilistic Classification

Introduction

Naive Bayes is a family of simple yet powerful probabilistic classifiers based on Bayes' theorem with the "naive" assumption of conditional independence between features. Despite this simplifying assumption, Naive Bayes classifiers work remarkably well in practice, especially for text classification, spam filtering, and recommendation systems. They're fast, scalable, and perform well with small training datasets.

Core Concepts and Theory

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris, fetch_20newsgroups, make_classification
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("="*60)
print("NAIVE BAYES FUNDAMENTALS")
print("="*60)

# Core concepts
naive_bayes_concepts = """
NAIVE BAYES KEY CONCEPTS:

1. BAYES' THEOREM:
   P(class|features) = P(features|class) * P(class) / P(features)
   
   Where:
   • P(class|features): Posterior probability
   • P(features|class): Likelihood
   • P(class): Prior probability
   • P(features): Evidence

2. NAIVE ASSUMPTION:
   • Features are conditionally independent given the class
   • P(x1,x2,...,xn|class) = P(x1|class) * P(x2|class) * ... * P(xn|class)
   • Simplifies computation dramatically
   • Often works well despite assumption violation

3. TYPES OF NAIVE BAYES:
   
   A) GAUSSIAN NB:
      • For continuous features
      • Assumes normal distribution
      • Uses mean and variance
   
   B) MULTINOMIAL NB:
      • For discrete counts
      • Text classification
      • Document term frequencies
   
   C) BERNOULLI NB:
      • For binary features
      • Document classification
      • Presence/absence of features
   
   D) COMPLEMENT NB:
      • For imbalanced datasets
      • Better for skewed classes

4. ADVANTAGES:
   • Fast training and prediction
   • Works well with small datasets
   • Handles high dimensions well
   • Provides probability estimates
   • No hyperparameter tuning
   • Naturally multi-class

5. DISADVANTAGES:
   • Assumes feature independence
   • Can be sensitive to feature scaling
   • Zero frequency problem
   • May be outperformed by complex models
"""

print(naive_bayes_concepts)

Gaussian Naive Bayes

class GaussianNBAnalyzer:
    """Comprehensive Gaussian Naive Bayes analysis"""
    
    def __init__(self):
        self.models = {}
        self.results = {}
        
    def visualize_gaussian_assumption(self, X, y):
        """Visualize the Gaussian assumption for features"""
        
        # Use Iris dataset for visualization
        iris = load_iris()
        X_iris = iris.data[:, :2]  # Use first 2 features
        y_iris = iris.target
        
        # Fit Gaussian NB
        gnb = GaussianNB()
        gnb.fit(X_iris, y_iris)
        
        # Create mesh for decision boundary
        h = 0.02
        x_min, x_max = X_iris[:, 0].min() - 1, X_iris[:, 0].max() + 1
        y_min, y_max = X_iris[:, 1].min() - 1, X_iris[:, 1].max() + 1
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                            np.arange(y_min, y_max, h))
        
        Z = gnb.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
        Z = Z.reshape(xx.shape)
        
        # Visualization
        fig, axes = plt.subplots(2, 3, figsize=(15, 10))
        
        # Decision boundary
        axes[0, 0].contourf(xx, yy, gnb.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape),
                           alpha=0.3, cmap='viridis')
        scatter = axes[0, 0].scatter(X_iris[:, 0], X_iris[:, 1], c=y_iris,
                                    cmap='viridis', edgecolor='black', linewidth=0.5)
        axes[0, 0].set_xlabel(iris.feature_names[0])
        axes[0, 0].set_ylabel(iris.feature_names[1])
        axes[0, 0].set_title('Decision Boundaries')
        plt.colorbar(scatter, ax=axes[0, 0])
        
        # Feature distributions per class
        for class_idx in range(3):
            mask = y_iris == class_idx
            
            # Feature 1 distribution
            axes[0, 1].hist(X_iris[mask, 0], alpha=0.5, bins=15,
                          label=f'Class {class_idx}', density=True)
            
            # Fit and plot Gaussian
            mean = gnb.theta_[class_idx, 0]
            var = gnb.var_[class_idx, 0]
            x_range = np.linspace(X_iris[:, 0].min(), X_iris[:, 0].max(), 100)
            gaussian = (1/np.sqrt(2*np.pi*var)) * np.exp(-0.5*((x_range-mean)**2/var))
            axes[0, 1].plot(x_range, gaussian, linewidth=2)
        
        axes[0, 1].set_xlabel(iris.feature_names[0])
        axes[0, 1].set_ylabel('Density')
        axes[0, 1].set_title('Feature 1: Gaussian Fits')
        axes[0, 1].legend()
        
        # Feature 2 distribution
        for class_idx in range(3):
            mask = y_iris == class_idx
            axes[0, 2].hist(X_iris[mask, 1], alpha=0.5, bins=15,
                          label=f'Class {class_idx}', density=True)
            
            # Fit and plot Gaussian
            mean = gnb.theta_[class_idx, 1]
            var = gnb.var_[class_idx, 1]
            y_range = np.linspace(X_iris[:, 1].min(), X_iris[:, 1].max(), 100)
            gaussian = (1/np.sqrt(2*np.pi*var)) * np.exp(-0.5*((y_range-mean)**2/var))
            axes[0, 2].plot(y_range, gaussian, linewidth=2)
        
        axes[0, 2].set_xlabel(iris.feature_names[1])
        axes[0, 2].set_ylabel('Density')
        axes[0, 2].set_title('Feature 2: Gaussian Fits')
        axes[0, 2].legend()
        
        # Probability contours
        axes[1, 0].contourf(xx, yy, Z, levels=20, alpha=0.7, cmap='RdYlBu_r')
        axes[1, 0].scatter(X_iris[:, 0], X_iris[:, 1], c=y_iris,
                         cmap='viridis', edgecolor='black', linewidth=0.5, s=30)
        axes[1, 0].set_xlabel(iris.feature_names[0])
        axes[1, 0].set_ylabel(iris.feature_names[1])
        axes[1, 0].set_title('Probability Contours (Class 1)')
        
        # Learned parameters
        params_text = "Learned Parameters:\n\n"
        for class_idx in range(3):
            params_text += f"Class {class_idx}:\n"
            params_text += f"  Prior: {gnb.class_prior_[class_idx]:.3f}\n"
            params_text += f"  Mean: {gnb.theta_[class_idx]}\n"
            params_text += f"  Var: {gnb.var_[class_idx]}\n\n"
        
        axes[1, 1].text(0.1, 0.5, params_text, fontsize=10,
                       verticalalignment='center', family='monospace')
        axes[1, 1].set_title('Learned Parameters')
        axes[1, 1].axis('off')
        
        # Confusion matrix
        y_pred = gnb.predict(X_iris)
        cm = confusion_matrix(y_iris, y_pred)
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[1, 2])
        axes[1, 2].set_xlabel('Predicted')
        axes[1, 2].set_ylabel('Actual')
        axes[1, 2].set_title(f'Confusion Matrix (Acc: {accuracy_score(y_iris, y_pred):.3f})')
        
        plt.suptitle('Gaussian Naive Bayes Analysis', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return gnb
    
    def compare_with_different_variances(self, n_samples=1000):
        """Compare performance with different feature variances"""
        
        # Generate datasets with different variances
        variances = [0.5, 1.0, 2.0, 5.0]
        results = []
        
        fig, axes = plt.subplots(2, len(variances), figsize=(16, 8))
        
        for idx, var in enumerate(variances):
            # Generate data with specific variance
            X, y = make_classification(n_samples=n_samples, n_features=2,
                                      n_informative=2, n_redundant=0,
                                      n_clusters_per_class=1, class_sep=var,
                                      random_state=42)
            
            # Split data
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.3, random_state=42
            )
            
            # Train Gaussian NB
            gnb = GaussianNB()
            gnb.fit(X_train, y_train)
            
            # Evaluate
            train_score = gnb.score(X_train, y_train)
            test_score = gnb.score(X_test, y_test)
            results.append({'variance': var, 'train': train_score, 'test': test_score})
            
            # Plot data distribution
            axes[0, idx].scatter(X[:, 0], X[:, 1], c=y, alpha=0.5, cmap='viridis')
            axes[0, idx].set_title(f'Variance: {var}')
            axes[0, idx].set_xlabel('Feature 1')
            axes[0, idx].set_ylabel('Feature 2')
            
            # Plot decision boundary
            h = 0.5
            x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
            y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
            xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                                np.arange(y_min, y_max, h))
            
            Z = gnb.predict(np.c_[xx.ravel(), yy.ravel()])
            Z = Z.reshape(xx.shape)
            
            axes[1, idx].contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
            axes[1, idx].scatter(X_test[:, 0], X_test[:, 1], c=y_test,
                               cmap='viridis', edgecolor='black', linewidth=0.5, s=30)
            axes[1, idx].set_title(f'Test Acc: {test_score:.3f}')
            axes[1, idx].set_xlabel('Feature 1')
            axes[1, idx].set_ylabel('Feature 2')
        
        plt.suptitle('Effect of Feature Variance on Gaussian NB', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return pd.DataFrame(results)

# Gaussian NB Analysis
gaussian_analyzer = GaussianNBAnalyzer()

print("\n" + "="*60)
print("GAUSSIAN NAIVE BAYES")
print("="*60)

print("\n1. Visualizing Gaussian Assumptions:")
iris = load_iris()
gnb_model = gaussian_analyzer.visualize_gaussian_assumption(iris.data, iris.target)

print("\n2. Effect of Feature Variance:")
variance_results = gaussian_analyzer.compare_with_different_variances()
print("\nResults by variance:")
print(variance_results)

Text Classification with Multinomial and Bernoulli NB

class TextClassificationNB:
    """Naive Bayes for text classification"""
    
    def __init__(self):
        self.models = {}
        self.vectorizers = {}
        
    def compare_nb_variants_text(self):
        """Compare different NB variants for text classification"""
        
        # Create sample text dataset
        documents = [
            # Sports
            "The team won the championship game last night",
            "Players trained hard for the upcoming match",
            "The basketball season starts next month",
            "Football fans celebrated the victory",
            "Athletes prepare for Olympic games",
            
            # Technology
            "New smartphone features artificial intelligence",
            "Software developers release updated version",
            "Cloud computing transforms business operations",
            "Machine learning algorithms improve accuracy",
            "Cybersecurity threats increase globally",
            
            # Food
            "Italian restaurant serves authentic pasta",
            "Fresh ingredients make better recipes",
            "Cooking classes teach culinary skills",
            "Local farmers market sells organic produce",
            "Chef prepares gourmet meal for guests"
        ]
        
        labels = [0, 0, 0, 0, 0,  # Sports
                 1, 1, 1, 1, 1,  # Technology
                 2, 2, 2, 2, 2]  # Food
        
        label_names = ['Sports', 'Technology', 'Food']
        
        # Vectorize text
        count_vectorizer = CountVectorizer()
        X_counts = count_vectorizer.fit_transform(documents)
        
        tfidf_vectorizer = TfidfVectorizer()
        X_tfidf = tfidf_vectorizer.fit_transform(documents)
        
        # Binary vectorizer
        binary_vectorizer = CountVectorizer(binary=True)
        X_binary = binary_vectorizer.fit_transform(documents)
        
        # Compare different NB variants
        models = {
            'Multinomial (Counts)': (MultinomialNB(), X_counts),
            'Multinomial (TF-IDF)': (MultinomialNB(), X_tfidf),
            'Bernoulli': (BernoulliNB(), X_binary),
            'Complement': (ComplementNB(), X_counts)
        }
        
        # Cross-validation scores
        cv_scores = {}
        for name, (model, X) in models.items():
            scores = cross_val_score(model, X, labels, cv=3)
            cv_scores[name] = scores
            model.fit(X, labels)  # Fit for later use
            self.models[name] = model
        
        # Visualization
        fig, axes = plt.subplots(2, 2, figsize=(14, 10))
        
        # CV scores comparison
        axes[0, 0].boxplot(cv_scores.values(), labels=cv_scores.keys())
        axes[0, 0].set_ylabel('Accuracy')
        axes[0, 0].set_title('Cross-Validation Scores')
        axes[0, 0].set_xticklabels(cv_scores.keys(), rotation=45, ha='right')
        axes[0, 0].grid(True, alpha=0.3, axis='y')
        
        # Feature importance (top words per class)
        mnb = self.models['Multinomial (Counts)']
        feature_names = count_vectorizer.get_feature_names_out()
        
        # Get log probabilities for each class
        for class_idx, class_name in enumerate(label_names):
            # Get top features for this class
            log_prob = mnb.feature_log_prob_[class_idx]
            top_indices = np.argsort(log_prob)[-10:][::-1]
            top_words = [feature_names[i] for i in top_indices]
            top_probs = np.exp(log_prob[top_indices])
            
            # Plot
            ax_idx = class_idx + 1 if class_idx < 2 else 3
            row = 0 if class_idx < 2 else 1
            col = ax_idx if class_idx < 2 else 1
            
            axes[row, col].barh(range(10), top_probs, color=f'C{class_idx}')
            axes[row, col].set_yticks(range(10))
            axes[row, col].set_yticklabels(top_words)
            axes[row, col].set_xlabel('Probability')
            axes[row, col].set_title(f'Top Words: {class_name}')
            axes[row, col].grid(True, alpha=0.3, axis='x')
        
        # Model comparison summary
        summary_text = "Model Comparison:\n\n"
        for name, scores in cv_scores.items():
            summary_text += f"{name}:\n"
            summary_text += f"  Mean: {scores.mean():.3f}\n"
            summary_text += f"  Std:  {scores.std():.3f}\n\n"
        
        axes[1, 0].text(0.1, 0.5, summary_text, fontsize=10,
                       verticalalignment='center', family='monospace')
        axes[1, 0].set_title('Summary Statistics')
        axes[1, 0].axis('off')
        
        plt.suptitle('Naive Bayes Text Classification Comparison', 
                    fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return cv_scores
    
    def spam_detection_example(self):
        """Implement spam detection with Naive Bayes"""
        
        # Create spam/ham dataset
        messages = [
            # Spam
            "WINNER! You've won $1000 cash prize! Click here now!",
            "Free credit report! Limited time offer! Act now!",
            "Congratulations! You've been selected for a free vacation!",
            "URGENT: Your account will be closed. Verify immediately!",
            "Make money fast! Work from home! Guaranteed income!",
            "Hot singles in your area! Click to meet them now!",
            "Lose weight fast with this one simple trick!",
            "Your prescription is ready. Order pills online cheap!",
            
            # Ham (legitimate)
            "Meeting scheduled for tomorrow at 2pm",
            "Can you review the attached document?",
            "Thanks for your help with the project",
            "Dinner plans confirmed for Saturday",
            "Your package has been delivered",
            "Reminder: Doctor appointment next Tuesday",
            "Happy birthday! Hope you have a great day",
            "Please submit your report by Friday"
        ]
        
        labels = [1]*8 + [0]*8  # 1=spam, 0=ham
        
        # Additional features
        features_df = pd.DataFrame({
            'message': messages,
            'length': [len(m) for m in messages],
            'exclamation': [m.count('!') for m in messages],
            'capitals': [sum(1 for c in m if c.isupper())/len(m) for m in messages],
            'dollar': [m.count('$') for m in messages]
        })
        
        # Vectorize text
        vectorizer = TfidfVectorizer(max_features=50)
        X_text = vectorizer.fit_transform(messages)
        
        # Combine text and numerical features
        X_numerical = features_df[['length', 'exclamation', 'capitals', 'dollar']].values
        X_combined = np.hstack([X_text.toarray(), X_numerical])
        
        # Train models
        nb_text = MultinomialNB()
        nb_text.fit(X_text, labels)
        
        gnb_combined = GaussianNB()
        gnb_combined.fit(X_combined, labels)
        
        # Predictions and probabilities
        prob_text = nb_text.predict_proba(X_text)
        prob_combined = gnb_combined.predict_proba(X_combined)
        
        # Visualization
        fig, axes = plt.subplots(2, 2, figsize=(14, 10))
        
        # Probability distribution
        axes[0, 0].hist(prob_text[labels==0, 1], alpha=0.5, bins=10,
                       label='Ham', color='green')
        axes[0, 0].hist(prob_text[labels==1, 1], alpha=0.5, bins=10,
                       label='Spam', color='red')
        axes[0, 0].set_xlabel('Spam Probability (Text Only)')
        axes[0, 0].set_ylabel('Count')
        axes[0, 0].set_title('Probability Distribution')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)
        
        # Feature importance
        feature_importance = nb_text.feature_log_prob_[1] - nb_text.feature_log_prob_[0]
        top_spam_indices = np.argsort(feature_importance)[-5:]
        top_ham_indices = np.argsort(feature_importance)[:5]
        
        feature_names = vectorizer.get_feature_names_out()
        
        spam_words = [feature_names[i] for i in top_spam_indices]
        ham_words = [feature_names[i] for i in top_ham_indices]
        
        y_pos = np.arange(5)
        axes[0, 1].barh(y_pos, feature_importance[top_spam_indices],
                       color='red', alpha=0.7)
        axes[0, 1].barh(y_pos + 5, feature_importance[top_ham_indices],
                       color='green', alpha=0.7)
        axes[0, 1].set_yticks(np.arange(10))
        axes[0, 1].set_yticklabels(spam_words + ham_words)
        axes[0, 1].set_xlabel('Log Probability Difference')
        axes[0, 1].set_title('Spam vs Ham Indicators')
        axes[0, 1].grid(True, alpha=0.3, axis='x')
        
        # Numerical features analysis
        for idx, (feature, color) in enumerate(zip(['exclamation', 'capitals'], 
                                                   ['orange', 'purple'])):
            axes[1, 0].scatter(features_df[feature][labels==0],
                             prob_text[labels==0, 1],
                             alpha=0.5, label=f'Ham ({feature})',
                             color=color, marker='o')
            axes[1, 0].scatter(features_df[feature][labels==1],
                             prob_text[labels==1, 1],
                             alpha=0.5, label=f'Spam ({feature})',
                             color=color, marker='^')
        
        axes[1, 0].set_xlabel('Feature Value')
        axes[1, 0].set_ylabel('Spam Probability')
        axes[1, 0].set_title('Feature Correlation with Spam')
        axes[1, 0].legend()
        axes[1, 0].grid(True, alpha=0.3)
        
        # Model comparison
        pred_text = nb_text.predict(X_text)
        pred_combined = gnb_combined.predict(X_combined)
        
        results_text = f"Model Performance:\n\n"
        results_text += f"Text Only Model:\n"
        results_text += f"  Accuracy: {accuracy_score(labels, pred_text):.3f}\n"
        results_text += f"  Spam detected: {pred_text.sum()}/{sum(labels)}\n\n"
        results_text += f"Combined Features Model:\n"
        results_text += f"  Accuracy: {accuracy_score(labels, pred_combined):.3f}\n"
        results_text += f"  Spam detected: {pred_combined.sum()}/{sum(labels)}\n"
        
        axes[1, 1].text(0.1, 0.5, results_text, fontsize=11,
                       verticalalignment='center', family='monospace')
        axes[1, 1].set_title('Performance Summary')
        axes[1, 1].axis('off')
        
        plt.suptitle('Spam Detection with Naive Bayes', fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        return nb_text, gnb_combined

# Text classification
text_classifier = TextClassificationNB()

print("\n" + "="*60)
print("TEXT CLASSIFICATION WITH NAIVE BAYES")
print("="*60)

print("\n1. Comparing NB Variants for Text:")
text_scores = text_classifier.compare_nb_variants_text()

print("\n2. Spam Detection Example:")
spam_model, combined_model = text_classifier.spam_detection_example()

Advanced Applications and Techniques

class AdvancedNaiveBayes:
    """Advanced Naive Bayes techniques"""
    
    def __init__(self):
        self.models = {}
        
    def handle_zero_probability(self):
        """Demonstrate Laplace smoothing for zero probability problem"""
        
        # Create dataset with rare features
        X_train = np.array([
            [1, 1, 0],
            [1, 1, 0],
            [0, 1, 1],
            [0, 0, 1]
        ])
        y_train = np.array([0, 0, 1, 1])
        
        # Test sample with unseen feature combination
        X_test = np.array([[1, 0, 0]])  # Feature 2 is 0 for class 0 in training
        
        # Compare with and without smoothing
        mnb_no_smooth = MultinomialNB(alpha=0.0)  # No smoothing
        mnb_smooth = MultinomialNB(alpha=1.0)     # Laplace smoothing
        
        mnb_no_smooth.fit(X_train, y_train)
        mnb_smooth.fit(X_train, y_train)
        
        # Visualization
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        
        # Feature probabilities without smoothing
        feature_prob_no_smooth = np.exp(mnb_no_smooth.feature_log_prob_)
        im1 = axes[0].imshow(feature_prob_no_smooth, cmap='YlOrRd', vmin=0, vmax=1)
        axes[0].set_xlabel('Feature Index')
        axes[0].set_ylabel('Class')
        axes[0].set_title('Without Smoothing (α=0)')
        plt.colorbar(im1, ax=axes[0])
        
        # Add text annotations
        for i in range(2):
            for j in range(3):
                axes[0].text(j, i, f'{feature_prob_no_smooth[i, j]:.2f}',
                           ha='center', va='center')
        
        # Feature probabilities with smoothing
        feature_prob_smooth = np.exp(mnb_smooth.feature_log_prob_)
        im2 = axes[1].imshow(feature_prob_smooth, cmap='YlOrRd', vmin=0, vmax=1)
        axes[1].set_xlabel('Feature Index')
        axes[1].set_ylabel('Class')
        axes[1].set_title('With Laplace Smoothing (α=1)')
        plt.colorbar(im2, ax=axes[1])
        
        for i in range(2):
            for j in range(3):
                axes[1].text(j, i, f'{feature_prob_smooth[i, j]:.2f}',
                           ha='center', va='center')
        
        # Effect of different alpha values
        alphas = np.logspace(-3, 1, 20)
        test_probs = []
        
        for alpha in alphas:
            mnb_temp = MultinomialNB(alpha=alpha)
            mnb_temp.fit(X_train, y_train)
            prob = mnb_temp.predict_proba(X_test)[0, 0]
            test_probs.append(prob)
        
        axes[2].plot(alphas, test_probs, marker='o', linewidth=2)
        axes[2].set_xscale('log')
        axes[2].set_xlabel('Alpha (Smoothing Parameter)')
        axes[2].set_ylabel('P(Class 0 | Test Sample)')
        axes[2].set_title('Effect of Smoothing on Prediction')
        axes[2].grid(True, alpha=0.3)
        axes[2].axhline(y=0.5, color='r', linestyle='--', alpha=0.5)
        
        plt.suptitle('Handling Zero Probability with Laplace Smoothing', 
                    fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        print("\nZero Probability Problem:")
        print(f"Without smoothing - P(Class 0): {mnb_no_smooth.predict_proba(X_test)[0, 0]:.4f}")
        print(f"With smoothing - P(Class 0): {mnb_smooth.predict_proba(X_test)[0, 0]:.4f}")
    
    def semi_supervised_nb(self, n_labeled=50):
        """Semi-supervised learning with Naive Bayes"""
        
        # Generate dataset
        X, y = make_classification(n_samples=500, n_features=20,
                                  n_informative=15, n_redundant=5,
                                  n_classes=3, random_state=42)
        
        # Create semi-supervised scenario
        # Only label first n_labeled samples
        y_semi = y.copy()
        y_semi[n_labeled:] = -1  # Unlabeled
        
        # Self-training approach
        from sklearn.semi_supervised import SelfTrainingClassifier
        
        base_nb = GaussianNB()
        self_training_nb = SelfTrainingClassifier(base_nb, threshold=0.75)
        
        # Train on labeled data only
        X_labeled = X[:n_labeled]
        y_labeled = y[:n_labeled]
        
        nb_supervised = GaussianNB()
        nb_supervised.fit(X_labeled, y_labeled)
        
        # Self-training (uses unlabeled data)
        self_training_nb.fit(X, y_semi)
        
        # Evaluate
        X_test = X[400:]
        y_test = y[400:]
        
        acc_supervised = nb_supervised.score(X_test, y_test)
        acc_semi = self_training_nb.score(X_test, y_test)
        
        # Visualization
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        
        # Learning curves
        n_labeled_range = [10, 20, 30, 40, 50, 75, 100]
        acc_sup_list = []
        acc_semi_list = []
        
        for n in n_labeled_range:
            y_temp = y.copy()
            y_temp[n:] = -1
            
            # Supervised
            nb_temp = GaussianNB()
            nb_temp.fit(X[:n], y[:n])
            acc_sup_list.append(nb_temp.score(X_test, y_test))
            
            # Semi-supervised
            st_temp = SelfTrainingClassifier(GaussianNB(), threshold=0.75)
            st_temp.fit(X, y_temp)
            acc_semi_list.append(st_temp.score(X_test, y_test))
        
        axes[0].plot(n_labeled_range, acc_sup_list, 'o-', label='Supervised')
        axes[0].plot(n_labeled_range, acc_semi_list, 's-', label='Semi-supervised')
        axes[0].set_xlabel('Number of Labeled Samples')
        axes[0].set_ylabel('Test Accuracy')
        axes[0].set_title('Learning Curves')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Confidence distribution
        proba_supervised = nb_supervised.predict_proba(X[n_labeled:400])
        proba_semi = self_training_nb.predict_proba(X[n_labeled:400])
        
        max_prob_sup = np.max(proba_supervised, axis=1)
        max_prob_semi = np.max(proba_semi, axis=1)
        
        axes[1].hist(max_prob_sup, alpha=0.5, bins=20, label='Supervised')
        axes[1].hist(max_prob_semi, alpha=0.5, bins=20, label='Semi-supervised')
        axes[1].set_xlabel('Maximum Class Probability')
        axes[1].set_ylabel('Count')
        axes[1].set_title('Prediction Confidence')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)
        
        # Performance comparison
        methods = ['Supervised\nOnly', 'Semi-supervised\n(Self-training)']
        accuracies = [acc_supervised, acc_semi]
        
        bars = axes[2].bar(methods, accuracies, color=['coral', 'lightgreen'])
        axes[2].set_ylabel('Test Accuracy')
        axes[2].set_title(f'Performance with {n_labeled} Labeled Samples')
        axes[2].set_ylim(0, 1)
        axes[2].grid(True, alpha=0.3, axis='y')
        
        # Add value labels
        for bar, acc in zip(bars, accuracies):
            axes[2].text(bar.get_x() + bar.get_width()/2, bar.get_height(),
                       f'{acc:.3f}', ha='center', va='bottom')
        
        plt.suptitle('Semi-supervised Learning with Naive Bayes', 
                    fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
        
        print(f"\nSemi-supervised Learning Results:")
        print(f"  Labeled samples: {n_labeled}")
        print(f"  Supervised accuracy: {acc_supervised:.3f}")
        print(f"  Semi-supervised accuracy: {acc_semi:.3f}")
        print(f"  Improvement: {(acc_semi - acc_supervised)*100:.1f}%")

# Advanced techniques
advanced_nb = AdvancedNaiveBayes()

print("\n" + "="*60)
print("ADVANCED NAIVE BAYES TECHNIQUES")
print("="*60)

print("\n1. Handling Zero Probability:")
advanced_nb.handle_zero_probability()

print("\n2. Semi-supervised Learning:")
advanced_nb.semi_supervised_nb()

Best Practices and Guidelines

print("\n" + "="*60)
print("NAIVE BAYES BEST PRACTICES")
print("="*60)

best_practices = """
KEY GUIDELINES:

1. CHOOSING THE RIGHT VARIANT:
   • Gaussian NB: Continuous features, normal distribution
   • Multinomial NB: Count data, text classification
   • Bernoulli NB: Binary features, document classification
   • Complement NB: Imbalanced datasets

2. DATA PREPROCESSING:
   • Scale features for Gaussian NB (sometimes helps)
   • Use appropriate vectorization for text (Count, TF-IDF)
   • Handle missing values before training
   • Consider log-transform for skewed features

3. HANDLING COMMON ISSUES:
   • Zero probability: Use Laplace smoothing (alpha > 0)
   • Correlated features: Consider feature selection
   • Imbalanced classes: Adjust priors or use Complement NB
   • Continuous features: Check normality assumption

4. ADVANTAGES TO LEVERAGE:
   ✓ Fast training and prediction
   ✓ Works with small training sets
   ✓ Provides probability estimates
   ✓ Naturally handles multi-class
   ✓ Good baseline model

5. LIMITATIONS TO CONSIDER:
   ✗ Assumes feature independence
   ✗ Sensitive to feature representation
   ✗ May underperform complex models
   ✗ Probability estimates can be poor

6. WHEN TO USE NAIVE BAYES:
   • Text classification (spam, sentiment)
   • Real-time prediction needed
   • Small training dataset
   • Multi-class problems
   • Baseline model needed
   • Interpretable probabilities required
"""

print(best_practices)

# Performance comparison
comparison_data = {
    'Aspect': ['Training Speed', 'Prediction Speed', 'Small Data', 'Large Data', 
               'Interpretability', 'Feature Independence'],
    'Naive Bayes': ['Fast', 'Fast', 'Good', 'Good', 'High', 'Required'],
    'Logistic Reg': ['Medium', 'Fast', 'Poor', 'Good', 'High', 'Not required'],
    'SVM': ['Slow', 'Medium', 'Good', 'Poor', 'Low', 'Not required'],
    'Random Forest': ['Medium', 'Fast', 'Poor', 'Good', 'Medium', 'Not required'],
    'Neural Network': ['Slow', 'Fast', 'Poor', 'Excellent', 'Low', 'Not required']
}

comparison_df = pd.DataFrame(comparison_data)
print("\nClassifier Comparison:")
print("="*60)
print(comparison_df.to_string(index=False))

# Implementation checklist
checklist = """
NAIVE BAYES IMPLEMENTATION CHECKLIST:
□ Choose appropriate NB variant for data type
□ Preprocess features appropriately
□ Handle missing values
□ Consider feature scaling (for Gaussian)
□ Set smoothing parameter (for discrete)
□ Check feature independence assumption
□ Validate with cross-validation
□ Compare with baseline models
□ Examine prediction probabilities
□ Test on holdout set
□ Monitor for concept drift in production
"""

print(checklist)

Practice Exercises

Exercise 1: Custom Naive Bayes Implementation

Build your own Naive Bayes classifier from scratch:

Implement Gaussian NB with numpy
Add Laplace smoothing for Multinomial NB
Handle mixed feature types
Compare with sklearn implementation
Optimize with vectorization

Exercise 2: Sentiment Analysis System

Build a complete sentiment analysis pipeline:

Preprocess text data (cleaning, tokenization)
Extract features (bag-of-words, TF-IDF, n-grams)
Train multiple NB variants
Implement confidence thresholds
Deploy as API with probability scores

Exercise 3: Incremental Naive Bayes

Implement online learning with Naive Bayes:

Create incremental update methods
Handle streaming data
Update priors and likelihoods online
Monitor performance over time
Implement concept drift detection

Summary and Key Takeaways

🎯 Key Points to Remember

Probabilistic Foundation: Based on Bayes' theorem with independence assumption
Multiple Variants: Gaussian, Multinomial, Bernoulli for different data types
Fast and Scalable: Excellent for large datasets and real-time applications
Small Data Friendly: Works well with limited training samples
Text Classification Star: Particularly effective for document classification
Probability Estimates: Provides interpretable probability scores
Independence Assumption: Works despite often-violated assumption
Baseline Model: Always worth trying as a simple baseline