Skip to main content

📷 CNNs: Convolutional Neural Networks for Computer Vision

Introduction

Convolutional Neural Networks (CNNs) have revolutionized computer vision by automatically learning hierarchical features from images. From simple edge detection in early layers to complex object recognition in deeper layers, CNNs mimic the human visual system's hierarchical processing. This lesson covers CNN fundamentals: convolutions, pooling, popular architectures (LeNet, AlexNet, VGG, ResNet), data augmentation, transfer learning, and practical applications in image classification, object detection, and segmentation.

CNN Fundamentals and Theory

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
import numpy as np
import matplotlib.pyplot as plt
import cv2
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {len(tf.config.list_physical_devices('GPU')) > 0}")

print("\n" + "="*60)
print("CNN FUNDAMENTALS")
print("="*60)

# Core concepts
cnn_concepts = """
CNN KEY CONCEPTS:

1. CONVOLUTIONAL LAYER:
   • Filters/Kernels: Detect features (edges, shapes, patterns)
   • Stride: Filter movement step size
   • Padding: Handle border pixels (valid/same)
   • Feature Maps: Output of convolution operation
   • Parameters: (filter_height × filter_width × input_channels + 1) × num_filters

2. POOLING LAYER:
   • Max Pooling: Take maximum value in window
   • Average Pooling: Take average value
   • Reduces spatial dimensions
   • Provides translation invariance
   • No learnable parameters

3. ARCHITECTURE COMPONENTS:
   • Conv blocks: Conv → Activation → Pooling
   • Feature extraction: Convolutional layers
   • Classification head: Fully connected layers
   • Depth increases, spatial size decreases

4. KEY PROPERTIES:
   • Parameter sharing: Same filter across image
   • Local connectivity: Neurons connect to local regions
   • Translation invariance: Detect features anywhere
   • Hierarchical learning: Simple → Complex features

5. POPULAR ARCHITECTURES:
   • LeNet-5 (1998): First successful CNN
   • AlexNet (2012): ImageNet breakthrough
   • VGG (2014): Deep with small filters
   • ResNet (2015): Skip connections
   • EfficientNet (2019): Optimal scaling

6. APPLICATIONS:
   • Image Classification
   • Object Detection (YOLO, R-CNN)
   • Semantic Segmentation
   • Face Recognition
   • Style Transfer
   • Medical Imaging
"""

print(cnn_concepts)

Building CNNs from Scratch

class CNNBuilder:
    """Build and visualize CNNs"""
    
    def __init__(self):
        self.models = {}
        
    def build_simple_cnn(self, input_shape=(32, 32, 3), num_classes=10):
        """Build a simple CNN for image classification"""
        
        model = keras.Sequential([
            # First Convolutional Block
            layers.Conv2D(32, (3, 3), padding='same', 
                         input_shape=input_shape, name='conv1'),
            layers.BatchNormalization(name='bn1'),
            layers.Activation('relu', name='relu1'),
            layers.MaxPooling2D((2, 2), name='pool1'),
            layers.Dropout(0.25, name='dropout1'),
            
            # Second Convolutional Block
            layers.Conv2D(64, (3, 3), padding='same', name='conv2'),
            layers.BatchNormalization(name='bn2'),
            layers.Activation('relu', name='relu2'),
            layers.MaxPooling2D((2, 2), name='pool2'),
            layers.Dropout(0.25, name='dropout2'),
            
            # Third Convolutional Block
            layers.Conv2D(128, (3, 3), padding='same', name='conv3'),
            layers.BatchNormalization(name='bn3'),
            layers.Activation('relu', name='relu3'),
            layers.GlobalAveragePooling2D(name='global_pool'),
            layers.Dropout(0.5, name='dropout3'),
            
            # Classification Head
            layers.Dense(128, activation='relu', name='fc1'),
            layers.Dropout(0.5, name='dropout4'),
            layers.Dense(num_classes, activation='softmax', name='output')
        ])
        
        return model
    
    def build_vgg_style_cnn(self, input_shape=(64, 64, 3), num_classes=10):
        """Build VGG-style CNN with multiple conv layers per block"""
        
        model = keras.Sequential([
            # Block 1
            layers.Conv2D(64, (3, 3), activation='relu', padding='same', 
                         input_shape=input_shape),
            layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
            layers.MaxPooling2D((2, 2)),
            layers.BatchNormalization(),
            
            # Block 2
            layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
            layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
            layers.MaxPooling2D((2, 2)),
            layers.BatchNormalization(),
            
            # Block 3
            layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
            layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
            layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
            layers.MaxPooling2D((2, 2)),
            layers.BatchNormalization(),
            
            # Classification
            layers.Flatten(),
            layers.Dense(512, activation='relu'),
            layers.Dropout(0.5),
            layers.Dense(512, activation='relu'),
            layers.Dropout(0.5),
            layers.Dense(num_classes, activation='softmax')
        ])
        
        return model
    
    def build_residual_cnn(self, input_shape=(32, 32, 3), num_classes=10):
        """Build CNN with residual connections"""
        
        def residual_block(x, filters, kernel_size=3, stride=1):
            """Create a residual block"""
            shortcut = x
            
            # Main path
            x = layers.Conv2D(filters, kernel_size, strides=stride, 
                            padding='same')(x)
            x = layers.BatchNormalization()(x)
            x = layers.Activation('relu')(x)
            
            x = layers.Conv2D(filters, kernel_size, padding='same')(x)
            x = layers.BatchNormalization()(x)
            
            # Shortcut path - adjust dimensions if needed
            if stride != 1 or shortcut.shape[-1] != filters:
                shortcut = layers.Conv2D(filters, 1, strides=stride, 
                                        padding='same')(shortcut)
                shortcut = layers.BatchNormalization()(shortcut)
            
            # Add shortcut to main path
            x = layers.Add()([x, shortcut])
            x = layers.Activation('relu')(x)
            
            return x
        
        # Build model
        inputs = layers.Input(shape=input_shape)
        
        # Initial convolution
        x = layers.Conv2D(64, 3, padding='same')(inputs)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)
        
        # Residual blocks
        x = residual_block(x, 64)
        x = residual_block(x, 64)
        x = residual_block(x, 128, stride=2)
        x = residual_block(x, 128)
        x = residual_block(x, 256, stride=2)
        x = residual_block(x, 256)
        
        # Classification
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dense(128, activation='relu')(x)
        x = layers.Dropout(0.5)(x)
        outputs = layers.Dense(num_classes, activation='softmax')(x)
        
        model = keras.Model(inputs, outputs)
        
        return model
    
    def visualize_filters(self, model, layer_name='conv1'):
        """Visualize convolutional filters"""
        
        # Get the layer
        for layer in model.layers:
            if layer.name == layer_name and isinstance(layer, layers.Conv2D):
                filters, biases = layer.get_weights()
                break
        else:
            print(f"Layer {layer_name} not found")
            return
        
        # Normalize filter values
        f_min, f_max = filters.min(), filters.max()
        filters = (filters - f_min) / (f_max - f_min)
        
        # Plot filters
        n_filters = min(filters.shape[3], 32)  # Show max 32 filters
        n_cols = 8
        n_rows = n_filters // n_cols + (1 if n_filters % n_cols else 0)
        
        fig, axes = plt.subplots(n_rows, n_cols, figsize=(16, n_rows*2))
        axes = axes.flatten() if n_rows > 1 else [axes]
        
        for i in range(n_filters):
            # Get filter
            f = filters[:, :, :, i]
            
            # Handle different channel counts
            if f.shape[2] == 1:
                axes[i].imshow(f[:, :, 0], cmap='gray')
            elif f.shape[2] == 3:
                axes[i].imshow(f)
            else:
                # For multi-channel, show first channel
                axes[i].imshow(f[:, :, 0], cmap='viridis')
            
            axes[i].set_title(f'Filter {i}', fontsize=8)
            axes[i].axis('off')
        
        # Hide unused subplots
        for i in range(n_filters, len(axes)):
            axes[i].axis('off')
        
        plt.suptitle(f'Convolutional Filters from {layer_name}', fontsize=14)
        plt.tight_layout()
        plt.show()
    
    def demonstrate_convolution_operation(self):
        """Visualize convolution operation step by step"""
        
        # Create sample image
        image = np.zeros((7, 7))
        image[1:6, 1:6] = 1
        image[2:5, 2:5] = 2
        image[3, 3] = 3
        
        # Define filters
        filters = {
            'Edge Horizontal': np.array([[-1, -1, -1],
                                        [0, 0, 0],
                                        [1, 1, 1]]),
            'Edge Vertical': np.array([[-1, 0, 1],
                                      [-1, 0, 1],
                                      [-1, 0, 1]]),
            'Sharpen': np.array([[0, -1, 0],
                                [-1, 5, -1],
                                [0, -1, 0]]),
            'Blur': np.ones((3, 3)) / 9
        }
        
        fig, axes = plt.subplots(2, 3, figsize=(12, 8))
        
        # Show original image
        axes[0, 0].imshow(image, cmap='gray')
        axes[0, 0].set_title('Original Image')
        axes[0, 0].axis('off')
        
        # Apply filters
        for idx, (name, kernel) in enumerate(filters.items()):
            # Apply convolution
            from scipy import signal
            filtered = signal.convolve2d(image, kernel, mode='valid')
            
            row = (idx + 1) // 3
            col = (idx + 1) % 3
            
            axes[row, col].imshow(filtered, cmap='gray')
            axes[row, col].set_title(name)
            axes[row, col].axis('off')
        
        # Hide unused subplot
        axes[1, 2].axis('off')
        
        plt.suptitle('Convolution Operation Demonstration', fontsize=14)
        plt.tight_layout()
        plt.show()

# Create CNN builder
cnn_builder = CNNBuilder()

print("\n" + "="*60)
print("BUILDING CNNs")
print("="*60)

# Build different CNN architectures
simple_cnn = cnn_builder.build_simple_cnn()
vgg_cnn = cnn_builder.build_vgg_style_cnn()
residual_cnn = cnn_builder.build_residual_cnn()

print("\nSimple CNN Architecture:")
print("-" * 40)
simple_cnn.summary()

print("\nDemonstrating Convolution Operation:")
cnn_builder.demonstrate_convolution_operation()

print("\nVisualizing Initial Filters:")
cnn_builder.visualize_filters(simple_cnn, 'conv1')

Data Augmentation and Preprocessing

class DataAugmentationPipeline:
    """Image data augmentation techniques"""
    
    def __init__(self):
        self.augmenters = {}
        
    def create_augmentation_pipeline(self):
        """Create comprehensive data augmentation pipeline"""
        
        # Using Keras ImageDataGenerator
        train_datagen = ImageDataGenerator(
            rotation_range=20,
            width_shift_range=0.2,
            height_shift_range=0.2,
            horizontal_flip=True,
            vertical_flip=False,
            zoom_range=0.2,
            shear_range=0.2,
            fill_mode='nearest',
            brightness_range=[0.8, 1.2],
            preprocessing_function=None
        )
        
        # Validation data should only be rescaled
        val_datagen = ImageDataGenerator()
        
        return train_datagen, val_datagen
    
    def create_tf_augmentation(self):
        """Create augmentation using TensorFlow layers"""
        
        data_augmentation = keras.Sequential([
            layers.RandomFlip("horizontal"),
            layers.RandomRotation(0.2),
            layers.RandomZoom(0.2),
            layers.RandomContrast(0.2),
        ])
        
        return data_augmentation
    
    def demonstrate_augmentation(self):
        """Visualize augmentation effects"""
        
        # Create a sample image
        sample_image = np.random.rand(100, 100, 3)
        
        # Add some structure to make augmentation visible
        sample_image[30:70, 30:70, :] = 0.8
        sample_image[40:60, 40:60, 0] = 0.2
        sample_image[45:55, 45:55, 1] = 0.9
        
        # Create augmenter
        datagen = ImageDataGenerator(
            rotation_range=30,
            width_shift_range=0.2,
            height_shift_range=0.2,
            horizontal_flip=True,
            zoom_range=0.2,
            shear_range=0.2
        )
        
        # Generate augmented images
        sample_batch = sample_image.reshape((1,) + sample_image.shape)
        
        fig, axes = plt.subplots(3, 4, figsize=(12, 9))
        axes = axes.flatten()
        
        # Original image
        axes[0].imshow(sample_image)
        axes[0].set_title('Original', fontsize=10)
        axes[0].axis('off')
        
        # Generate augmented versions
        it = datagen.flow(sample_batch, batch_size=1)
        
        for i in range(1, 12):
            batch = next(it)
            image = batch[0]
            axes[i].imshow(image)
            axes[i].set_title(f'Augmented {i}', fontsize=10)
            axes[i].axis('off')
        
        plt.suptitle('Data Augmentation Examples', fontsize=14)
        plt.tight_layout()
        plt.show()
    
    def create_preprocessing_pipeline(self, input_shape=(224, 224, 3)):
        """Create complete preprocessing pipeline"""
        
        def preprocess_image(image, label):
            """Preprocessing function for tf.data"""
            # Resize
            image = tf.image.resize(image, input_shape[:2])
            
            # Normalize to [0, 1]
            image = tf.cast(image, tf.float32) / 255.0
            
            # Standardize (ImageNet statistics)
            mean = tf.constant([0.485, 0.456, 0.406])
            std = tf.constant([0.229, 0.224, 0.225])
            image = (image - mean) / std
            
            return image, label
        
        return preprocess_image
    
    def visualize_preprocessing_effects(self):
        """Show effects of different preprocessing steps"""
        
        # Create sample image
        image = np.random.randint(0, 255, (128, 128, 3), dtype=np.uint8)
        
        fig, axes = plt.subplots(2, 4, figsize=(16, 8))
        
        # Original
        axes[0, 0].imshow(image)
        axes[0, 0].set_title('Original')
        axes[0, 0].axis('off')
        
        # Resized
        resized = cv2.resize(image, (64, 64))
        axes[0, 1].imshow(resized)
        axes[0, 1].set_title('Resized (64x64)')
        axes[0, 1].axis('off')
        
        # Normalized [0, 1]
        normalized = image.astype(np.float32) / 255.0
        axes[0, 2].imshow(normalized)
        axes[0, 2].set_title('Normalized [0, 1]')
        axes[0, 2].axis('off')
        
        # Standardized (ImageNet)
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        standardized = (normalized - mean) / std
        # Clip for visualization
        standardized_vis = np.clip(standardized, 0, 1)
        axes[0, 3].imshow(standardized_vis)
        axes[0, 3].set_title('Standardized')
        axes[0, 3].axis('off')
        
        # Grayscale
        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        axes[1, 0].imshow(gray, cmap='gray')
        axes[1, 0].set_title('Grayscale')
        axes[1, 0].axis('off')
        
        # Edge detection
        edges = cv2.Canny(gray, 50, 150)
        axes[1, 1].imshow(edges, cmap='gray')
        axes[1, 1].set_title('Edge Detection')
        axes[1, 1].axis('off')
        
        # Histogram equalization
        equalized = cv2.equalizeHist(gray)
        axes[1, 2].imshow(equalized, cmap='gray')
        axes[1, 2].set_title('Histogram Equalized')
        axes[1, 2].axis('off')
        
        # Gaussian blur
        blurred = cv2.GaussianBlur(image, (15, 15), 0)
        axes[1, 3].imshow(blurred)
        axes[1, 3].set_title('Gaussian Blur')
        axes[1, 3].axis('off')
        
        plt.suptitle('Image Preprocessing Techniques', fontsize=14)
        plt.tight_layout()
        plt.show()

# Data augmentation
augmentation = DataAugmentationPipeline()

print("\n" + "="*60)
print("DATA AUGMENTATION AND PREPROCESSING")
print("="*60)

print("\nDemonstrating data augmentation:")
augmentation.demonstrate_augmentation()

print("\nVisualizing preprocessing effects:")
augmentation.visualize_preprocessing_effects()

# Create augmentation layers
tf_augmentation = augmentation.create_tf_augmentation()
print("\nTensorFlow augmentation layers created:")

Transfer Learning with Pre-trained Models

class TransferLearning:
    """Transfer learning with pre-trained models"""
    
    def __init__(self):
        self.models = {}
        
    def create_transfer_model(self, base_model_name='VGG16', 
                            input_shape=(224, 224, 3), 
                            num_classes=10,
                            trainable_layers=2):
        """Create transfer learning model"""
        
        # Load pre-trained model
        if base_model_name == 'VGG16':
            base_model = VGG16(input_shape=input_shape,
                              include_top=False,
                              weights='imagenet')
        elif base_model_name == 'ResNet50':
            base_model = ResNet50(input_shape=input_shape,
                                 include_top=False,
                                 weights='imagenet')
        elif base_model_name == 'MobileNetV2':
            base_model = MobileNetV2(input_shape=input_shape,
                                    include_top=False,
                                    weights='imagenet')
        else:
            raise ValueError(f"Unknown model: {base_model_name}")
        
        # Freeze base model layers
        base_model.trainable = False
        
        # Unfreeze last few layers for fine-tuning
        if trainable_layers > 0:
            for layer in base_model.layers[-trainable_layers:]:
                layer.trainable = True
        
        # Build model
        model = keras.Sequential([
            base_model,
            layers.GlobalAveragePooling2D(),
            layers.Dense(256, activation='relu'),
            layers.BatchNormalization(),
            layers.Dropout(0.5),
            layers.Dense(128, activation='relu'),
            layers.BatchNormalization(),
            layers.Dropout(0.5),
            layers.Dense(num_classes, activation='softmax')
        ])
        
        return model, base_model
    
    def compare_models(self):
        """Compare different pre-trained models"""
        
        models_to_compare = ['VGG16', 'ResNet50', 'MobileNetV2']
        results = {}
        
        for model_name in models_to_compare:
            model, base = self.create_transfer_model(model_name)
            
            # Count parameters
            total_params = model.count_params()
            trainable_params = sum([tf.size(w).numpy() 
                                  for w in model.trainable_weights])
            
            results[model_name] = {
                'total_params': total_params,
                'trainable_params': trainable_params,
                'base_layers': len(base.layers),
                'total_layers': len(model.layers)
            }
        
        # Visualize comparison
        fig, axes = plt.subplots(2, 2, figsize=(12, 10))
        
        # Parameters comparison
        model_names = list(results.keys())
        total_params = [results[m]['total_params'] for m in model_names]
        trainable_params = [results[m]['trainable_params'] for m in model_names]
        
        x = np.arange(len(model_names))
        width = 0.35
        
        axes[0, 0].bar(x - width/2, np.array(total_params)/1e6, width, 
                      label='Total', color='lightblue')
        axes[0, 0].bar(x + width/2, np.array(trainable_params)/1e6, width, 
                      label='Trainable', color='orange')
        axes[0, 0].set_xlabel('Model')
        axes[0, 0].set_ylabel('Parameters (Millions)')
        axes[0, 0].set_title('Model Parameters')
        axes[0, 0].set_xticks(x)
        axes[0, 0].set_xticklabels(model_names)
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3, axis='y')
        
        # Layers comparison
        base_layers = [results[m]['base_layers'] for m in model_names]
        axes[0, 1].bar(model_names, base_layers, color='steelblue')
        axes[0, 1].set_ylabel('Number of Layers')
        axes[0, 1].set_title('Base Model Layers')
        axes[0, 1].grid(True, alpha=0.3, axis='y')
        
        # Model characteristics
        characteristics = {
            'VGG16': {'Depth': 'Deep', 'Width': 'Wide', 'Year': 2014},
            'ResNet50': {'Depth': 'Very Deep', 'Width': 'Medium', 'Year': 2015},
            'MobileNetV2': {'Depth': 'Medium', 'Width': 'Narrow', 'Year': 2018}
        }
        
        # Create table
        table_data = []
        for model_name in model_names:
            char = characteristics[model_name]
            params = results[model_name]
            table_data.append([
                model_name,
                f"{params['total_params']/1e6:.1f}M",
                f"{params['base_layers']}",
                char['Year']
            ])
        
        table = axes[1, 0].table(cellText=table_data,
                                colLabels=['Model', 'Params', 'Layers', 'Year'],
                                cellLoc='center',
                                loc='center')
        table.auto_set_font_size(False)
        table.set_fontsize(10)
        table.scale(1, 1.5)
        axes[1, 0].axis('off')
        axes[1, 0].set_title('Model Comparison')
        
        # Training tips
        tips_text = """Transfer Learning Best Practices:
        
        1. Start with frozen base model
        2. Train only classifier head
        3. Unfreeze top layers gradually
        4. Use lower learning rate for base
        5. Monitor for overfitting
        6. Use appropriate preprocessing
        7. Consider model size vs accuracy
        """
        
        axes[1, 1].text(0.1, 0.5, tips_text, fontsize=10,
                       verticalalignment='center', family='monospace')
        axes[1, 1].set_title('Tips')
        axes[1, 1].axis('off')
        
        plt.suptitle('Transfer Learning Model Comparison', fontsize=14)
        plt.tight_layout()
        plt.show()
        
        return results
    
    def fine_tuning_strategy(self):
        """Demonstrate fine-tuning strategy"""
        
        print("\nFine-tuning Strategy:")
        print("-" * 40)
        
        strategy = """
        PHASE 1: Feature Extraction (5-10 epochs)
        - Freeze entire base model
        - Train only classifier head
        - Use higher learning rate (0.001)
        - Monitor validation accuracy
        
        PHASE 2: Fine-tuning (10-20 epochs)
        - Unfreeze top layers of base model
        - Use lower learning rate (0.0001)
        - Use differential learning rates
        - Watch for overfitting
        
        PHASE 3: Full Fine-tuning (optional)
        - Unfreeze entire model
        - Very low learning rate (0.00001)
        - Early stopping essential
        - Only if sufficient data
        """
        
        print(strategy)
        
        # Create example with different phases
        base_model = VGG16(input_shape=(224, 224, 3),
                          include_top=False,
                          weights='imagenet')
        
        # Phase 1: Freeze all
        base_model.trainable = False
        print(f"\nPhase 1: {sum([layer.trainable for layer in base_model.layers])} trainable layers")
        
        # Phase 2: Unfreeze top layers
        base_model.trainable = True
        for layer in base_model.layers[:-4]:
            layer.trainable = False
        print(f"Phase 2: {sum([layer.trainable for layer in base_model.layers])} trainable layers")
        
        # Phase 3: Unfreeze all
        base_model.trainable = True
        print(f"Phase 3: {sum([layer.trainable for layer in base_model.layers])} trainable layers")

# Transfer learning
transfer = TransferLearning()

print("\n" + "="*60)
print("TRANSFER LEARNING")
print("="*60)

print("\nComparing pre-trained models:")
model_comparison = transfer.compare_models()

print("\nFine-tuning strategy:")
transfer.fine_tuning_strategy()

Feature Visualization and Model Interpretation

class CNNVisualization:
    """Visualize CNN features and activations"""
    
    def __init__(self):
        self.visualizations = {}
        
    def visualize_feature_maps(self, model, image, layer_names=None):
        """Visualize intermediate feature maps"""
        
        if layer_names is None:
            # Get first few conv layers
            layer_names = [layer.name for layer in model.layers 
                          if 'conv' in layer.name][:3]
        
        # Create model that outputs intermediate layers
        layer_outputs = [model.get_layer(name).output for name in layer_names]
        activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)
        
        # Get activations
        if len(image.shape) == 3:
            image = np.expand_dims(image, axis=0)
        
        activations = activation_model.predict(image, verbose=0)
        
        # Plot feature maps
        for layer_name, activation in zip(layer_names, activations):
            n_features = min(activation.shape[-1], 16)  # Show max 16 features
            size = activation.shape[1]
            
            fig, axes = plt.subplots(4, 4, figsize=(10, 10))
            axes = axes.flatten()
            
            for i in range(n_features):
                axes[i].imshow(activation[0, :, :, i], cmap='viridis')
                axes[i].set_title(f'Feature {i}', fontsize=8)
                axes[i].axis('off')
            
            for i in range(n_features, 16):
                axes[i].axis('off')
            
            plt.suptitle(f'Feature Maps: {layer_name} (Shape: {activation.shape[1:]})',
                        fontsize=12)
            plt.tight_layout()
            plt.show()
    
    def create_cam_heatmap(self, model, image, pred_index=None):
        """Create Class Activation Map (CAM)"""
        
        # Get last conv layer
        last_conv_layer = None
        for layer in reversed(model.layers):
            if isinstance(layer, layers.Conv2D):
                last_conv_layer = layer
                break
        
        if last_conv_layer is None:
            print("No convolutional layer found")
            return None
        
        # Create model that maps input to activations of last conv layer
        last_conv_model = keras.Model(model.inputs, last_conv_layer.output)
        
        # Create model that maps from last conv to predictions
        classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
        x = classifier_input
        for layer in model.layers[model.layers.index(last_conv_layer) + 1:]:
            x = layer(x)
        classifier_model = keras.Model(classifier_input, x)
        
        # Get gradient of prediction with respect to last conv layer
        with tf.GradientTape() as tape:
            if len(image.shape) == 3:
                image = np.expand_dims(image, axis=0)
            
            last_conv_output = last_conv_model(image)
            tape.watch(last_conv_output)
            preds = classifier_model(last_conv_output)
            
            if pred_index is None:
                pred_index = tf.argmax(preds[0])
            
            class_channel = preds[:, pred_index]
        
        # Calculate gradients
        grads = tape.gradient(class_channel, last_conv_output)
        
        # Pool gradients
        pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
        
        # Weight feature maps by gradients
        last_conv_output = last_conv_output[0]
        heatmap = last_conv_output @ pooled_grads[..., tf.newaxis]
        heatmap = tf.squeeze(heatmap)
        
        # Normalize heatmap
        heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
        
        return heatmap.numpy()
    
    def visualize_predictions_with_cam(self, model, images, labels=None):
        """Visualize predictions with CAM overlay"""
        
        n_images = min(len(images), 6)
        fig, axes = plt.subplots(2, n_images, figsize=(3*n_images, 6))
        
        for i in range(n_images):
            img = images[i]
            
            # Predict
            if len(img.shape) == 3:
                img_batch = np.expand_dims(img, axis=0)
            else:
                img_batch = img
            
            pred = model.predict(img_batch, verbose=0)
            pred_class = np.argmax(pred[0])
            pred_prob = pred[0, pred_class]
            
            # Generate CAM
            heatmap = self.create_cam_heatmap(model, img, pred_class)
            
            # Display original
            axes[0, i].imshow(img if len(img.shape) == 3 else img.squeeze(), 
                            cmap='gray' if len(img.shape) == 2 else None)
            title = f'Pred: {pred_class} ({pred_prob:.2f})'
            if labels is not None:
                title = f'True: {labels[i]}\n' + title
            axes[0, i].set_title(title, fontsize=8)
            axes[0, i].axis('off')
            
            # Display with CAM overlay
            if heatmap is not None:
                # Resize heatmap
                heatmap_resized = cv2.resize(heatmap, 
                                            (img.shape[1], img.shape[0]))
                
                # Overlay
                axes[1, i].imshow(img if len(img.shape) == 3 else img.squeeze(), 
                                cmap='gray' if len(img.shape) == 2 else None)
                axes[1, i].imshow(heatmap_resized, cmap='jet', alpha=0.5)
                axes[1, i].set_title('CAM Heatmap', fontsize=8)
                axes[1, i].axis('off')
        
        plt.suptitle('Predictions with Class Activation Maps', fontsize=12)
        plt.tight_layout()
        plt.show()

# CNN Visualization
viz = CNNVisualization()

print("\n" + "="*60)
print("CNN VISUALIZATION")
print("="*60)

# Create sample images
sample_images = np.random.rand(6, 32, 32, 3).astype(np.float32)

print("\nVisualization techniques demonstrated")
print("(Feature maps and CAM require trained model with real data)")

Best Practices and Common Architectures

print("\n" + "="*60)
print("CNN BEST PRACTICES")
print("="*60)

best_practices = """
KEY GUIDELINES:

1. ARCHITECTURE DESIGN:
   • Start with proven architectures
   • Use batch normalization after conv layers
   • Add dropout for regularization (0.2-0.5)
   • Consider skip connections for deep networks
   • Use global average pooling instead of flatten

2. CONVOLUTION LAYERS:
   • 3x3 filters are most common
   • Increase filters as you go deeper (32→64→128)
   • Use 'same' padding to preserve dimensions
   • Stride 2 for downsampling (instead of pooling)

3. POOLING:
   • Max pooling for feature detection
   • Average pooling for smoother downsampling
   • Don't pool too aggressively early
   • Consider strided convolutions instead

4. DATA AUGMENTATION:
   • Essential for small datasets
   • Rotation, flipping, zooming, shifting
   • Color/brightness adjustments
   • Don't augment validation/test data
   • Use appropriate augmentation for domain

5. TRANSFER LEARNING:
   • Start with pre-trained models
   • Fine-tune carefully (low learning rate)
   • Freeze early layers initially
   • Match preprocessing to original training

6. TRAINING TIPS:
   • Use appropriate input size (224x224 common)
   • Normalize inputs properly
   • Start with Adam optimizer
   • Use learning rate scheduling
   • Monitor for overfitting

7. COMMON PROBLEMS:
   
   Overfitting:
   • More augmentation
   • Stronger regularization
   • Simpler architecture
   • Transfer learning
   
   Slow Training:
   • Reduce input size
   • Use simpler architecture
   • Mixed precision training
   • Better hardware (GPU)
   
   Poor Accuracy:
   • Check preprocessing
   • Verify labels are correct
   • Try different architecture
   • Increase model capacity
"""

print(best_practices)

# Architecture evolution
architecture_evolution = """
CNN ARCHITECTURE EVOLUTION:

LeNet-5 (1998):
• 2 Conv + 2 Pool + 2 FC
• 60K parameters
• Digits recognition

AlexNet (2012):
• 5 Conv + 3 FC
• 60M parameters
• ReLU, Dropout
• ImageNet winner

VGGNet (2014):
• 16-19 layers
• 3x3 convolutions only
• 138M parameters
• Simple, uniform

GoogLeNet/Inception (2014):
• 22 layers deep
• Inception modules
• 7M parameters
• Multi-scale processing

ResNet (2015):
• 50-152 layers
• Skip connections
• 25M parameters
• Solved vanishing gradient

DenseNet (2017):
• Dense connections
• Feature reuse
• Parameter efficient

EfficientNet (2019):
• Compound scaling
• Neural architecture search
• State-of-the-art accuracy
• Mobile-friendly
"""

print(architecture_evolution)

Practice Exercises

Exercise 1: Build Custom CNN Architecture

Design and implement your own CNN:

  1. Create inception-style modules
  2. Implement depthwise separable convolutions
  3. Add attention mechanisms
  4. Compare with standard architectures
  5. Optimize for mobile deployment

Exercise 2: Object Detection Implementation

Extend CNN for object detection:

  1. Implement sliding window approach
  2. Add bounding box regression
  3. Create anchor boxes
  4. Implement non-max suppression
  5. Evaluate with mAP metric

Exercise 3: Image Segmentation Network

Build semantic segmentation model:

  1. Implement U-Net architecture
  2. Add skip connections
  3. Create custom loss (Dice, IoU)
  4. Handle class imbalance
  5. Visualize segmentation masks

Summary and Key Takeaways

🎯 Key Points to Remember