Convolutional Neural Networks (CNNs) have revolutionized computer vision by automatically learning hierarchical features from images. From simple edge detection in early layers to complex object recognition in deeper layers, CNNs mimic the human visual system's hierarchical processing. This lesson covers CNN fundamentals: convolutions, pooling, popular architectures (LeNet, AlexNet, VGG, ResNet), data augmentation, transfer learning, and practical applications in image classification, object detection, and segmentation.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
import numpy as np
import matplotlib.pyplot as plt
import cv2
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {len(tf.config.list_physical_devices('GPU')) > 0}")
print("\n" + "="*60)
print("CNN FUNDAMENTALS")
print("="*60)
# Core concepts
cnn_concepts = """
CNN KEY CONCEPTS:
1. CONVOLUTIONAL LAYER:
• Filters/Kernels: Detect features (edges, shapes, patterns)
• Stride: Filter movement step size
• Padding: Handle border pixels (valid/same)
• Feature Maps: Output of convolution operation
• Parameters: (filter_height × filter_width × input_channels + 1) × num_filters
2. POOLING LAYER:
• Max Pooling: Take maximum value in window
• Average Pooling: Take average value
• Reduces spatial dimensions
• Provides translation invariance
• No learnable parameters
3. ARCHITECTURE COMPONENTS:
• Conv blocks: Conv → Activation → Pooling
• Feature extraction: Convolutional layers
• Classification head: Fully connected layers
• Depth increases, spatial size decreases
4. KEY PROPERTIES:
• Parameter sharing: Same filter across image
• Local connectivity: Neurons connect to local regions
• Translation invariance: Detect features anywhere
• Hierarchical learning: Simple → Complex features
5. POPULAR ARCHITECTURES:
• LeNet-5 (1998): First successful CNN
• AlexNet (2012): ImageNet breakthrough
• VGG (2014): Deep with small filters
• ResNet (2015): Skip connections
• EfficientNet (2019): Optimal scaling
6. APPLICATIONS:
• Image Classification
• Object Detection (YOLO, R-CNN)
• Semantic Segmentation
• Face Recognition
• Style Transfer
• Medical Imaging
"""
print(cnn_concepts)
class CNNBuilder:
"""Build and visualize CNNs"""
def __init__(self):
self.models = {}
def build_simple_cnn(self, input_shape=(32, 32, 3), num_classes=10):
"""Build a simple CNN for image classification"""
model = keras.Sequential([
# First Convolutional Block
layers.Conv2D(32, (3, 3), padding='same',
input_shape=input_shape, name='conv1'),
layers.BatchNormalization(name='bn1'),
layers.Activation('relu', name='relu1'),
layers.MaxPooling2D((2, 2), name='pool1'),
layers.Dropout(0.25, name='dropout1'),
# Second Convolutional Block
layers.Conv2D(64, (3, 3), padding='same', name='conv2'),
layers.BatchNormalization(name='bn2'),
layers.Activation('relu', name='relu2'),
layers.MaxPooling2D((2, 2), name='pool2'),
layers.Dropout(0.25, name='dropout2'),
# Third Convolutional Block
layers.Conv2D(128, (3, 3), padding='same', name='conv3'),
layers.BatchNormalization(name='bn3'),
layers.Activation('relu', name='relu3'),
layers.GlobalAveragePooling2D(name='global_pool'),
layers.Dropout(0.5, name='dropout3'),
# Classification Head
layers.Dense(128, activation='relu', name='fc1'),
layers.Dropout(0.5, name='dropout4'),
layers.Dense(num_classes, activation='softmax', name='output')
])
return model
def build_vgg_style_cnn(self, input_shape=(64, 64, 3), num_classes=10):
"""Build VGG-style CNN with multiple conv layers per block"""
model = keras.Sequential([
# Block 1
layers.Conv2D(64, (3, 3), activation='relu', padding='same',
input_shape=input_shape),
layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D((2, 2)),
layers.BatchNormalization(),
# Block 2
layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D((2, 2)),
layers.BatchNormalization(),
# Block 3
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D((2, 2)),
layers.BatchNormalization(),
# Classification
layers.Flatten(),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(num_classes, activation='softmax')
])
return model
def build_residual_cnn(self, input_shape=(32, 32, 3), num_classes=10):
"""Build CNN with residual connections"""
def residual_block(x, filters, kernel_size=3, stride=1):
"""Create a residual block"""
shortcut = x
# Main path
x = layers.Conv2D(filters, kernel_size, strides=stride,
padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters, kernel_size, padding='same')(x)
x = layers.BatchNormalization()(x)
# Shortcut path - adjust dimensions if needed
if stride != 1 or shortcut.shape[-1] != filters:
shortcut = layers.Conv2D(filters, 1, strides=stride,
padding='same')(shortcut)
shortcut = layers.BatchNormalization()(shortcut)
# Add shortcut to main path
x = layers.Add()([x, shortcut])
x = layers.Activation('relu')(x)
return x
# Build model
inputs = layers.Input(shape=input_shape)
# Initial convolution
x = layers.Conv2D(64, 3, padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
# Residual blocks
x = residual_block(x, 64)
x = residual_block(x, 64)
x = residual_block(x, 128, stride=2)
x = residual_block(x, 128)
x = residual_block(x, 256, stride=2)
x = residual_block(x, 256)
# Classification
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
model = keras.Model(inputs, outputs)
return model
def visualize_filters(self, model, layer_name='conv1'):
"""Visualize convolutional filters"""
# Get the layer
for layer in model.layers:
if layer.name == layer_name and isinstance(layer, layers.Conv2D):
filters, biases = layer.get_weights()
break
else:
print(f"Layer {layer_name} not found")
return
# Normalize filter values
f_min, f_max = filters.min(), filters.max()
filters = (filters - f_min) / (f_max - f_min)
# Plot filters
n_filters = min(filters.shape[3], 32) # Show max 32 filters
n_cols = 8
n_rows = n_filters // n_cols + (1 if n_filters % n_cols else 0)
fig, axes = plt.subplots(n_rows, n_cols, figsize=(16, n_rows*2))
axes = axes.flatten() if n_rows > 1 else [axes]
for i in range(n_filters):
# Get filter
f = filters[:, :, :, i]
# Handle different channel counts
if f.shape[2] == 1:
axes[i].imshow(f[:, :, 0], cmap='gray')
elif f.shape[2] == 3:
axes[i].imshow(f)
else:
# For multi-channel, show first channel
axes[i].imshow(f[:, :, 0], cmap='viridis')
axes[i].set_title(f'Filter {i}', fontsize=8)
axes[i].axis('off')
# Hide unused subplots
for i in range(n_filters, len(axes)):
axes[i].axis('off')
plt.suptitle(f'Convolutional Filters from {layer_name}', fontsize=14)
plt.tight_layout()
plt.show()
def demonstrate_convolution_operation(self):
"""Visualize convolution operation step by step"""
# Create sample image
image = np.zeros((7, 7))
image[1:6, 1:6] = 1
image[2:5, 2:5] = 2
image[3, 3] = 3
# Define filters
filters = {
'Edge Horizontal': np.array([[-1, -1, -1],
[0, 0, 0],
[1, 1, 1]]),
'Edge Vertical': np.array([[-1, 0, 1],
[-1, 0, 1],
[-1, 0, 1]]),
'Sharpen': np.array([[0, -1, 0],
[-1, 5, -1],
[0, -1, 0]]),
'Blur': np.ones((3, 3)) / 9
}
fig, axes = plt.subplots(2, 3, figsize=(12, 8))
# Show original image
axes[0, 0].imshow(image, cmap='gray')
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')
# Apply filters
for idx, (name, kernel) in enumerate(filters.items()):
# Apply convolution
from scipy import signal
filtered = signal.convolve2d(image, kernel, mode='valid')
row = (idx + 1) // 3
col = (idx + 1) % 3
axes[row, col].imshow(filtered, cmap='gray')
axes[row, col].set_title(name)
axes[row, col].axis('off')
# Hide unused subplot
axes[1, 2].axis('off')
plt.suptitle('Convolution Operation Demonstration', fontsize=14)
plt.tight_layout()
plt.show()
# Create CNN builder
cnn_builder = CNNBuilder()
print("\n" + "="*60)
print("BUILDING CNNs")
print("="*60)
# Build different CNN architectures
simple_cnn = cnn_builder.build_simple_cnn()
vgg_cnn = cnn_builder.build_vgg_style_cnn()
residual_cnn = cnn_builder.build_residual_cnn()
print("\nSimple CNN Architecture:")
print("-" * 40)
simple_cnn.summary()
print("\nDemonstrating Convolution Operation:")
cnn_builder.demonstrate_convolution_operation()
print("\nVisualizing Initial Filters:")
cnn_builder.visualize_filters(simple_cnn, 'conv1')
class DataAugmentationPipeline:
"""Image data augmentation techniques"""
def __init__(self):
self.augmenters = {}
def create_augmentation_pipeline(self):
"""Create comprehensive data augmentation pipeline"""
# Using Keras ImageDataGenerator
train_datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=False,
zoom_range=0.2,
shear_range=0.2,
fill_mode='nearest',
brightness_range=[0.8, 1.2],
preprocessing_function=None
)
# Validation data should only be rescaled
val_datagen = ImageDataGenerator()
return train_datagen, val_datagen
def create_tf_augmentation(self):
"""Create augmentation using TensorFlow layers"""
data_augmentation = keras.Sequential([
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.2),
layers.RandomZoom(0.2),
layers.RandomContrast(0.2),
])
return data_augmentation
def demonstrate_augmentation(self):
"""Visualize augmentation effects"""
# Create a sample image
sample_image = np.random.rand(100, 100, 3)
# Add some structure to make augmentation visible
sample_image[30:70, 30:70, :] = 0.8
sample_image[40:60, 40:60, 0] = 0.2
sample_image[45:55, 45:55, 1] = 0.9
# Create augmenter
datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
zoom_range=0.2,
shear_range=0.2
)
# Generate augmented images
sample_batch = sample_image.reshape((1,) + sample_image.shape)
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
axes = axes.flatten()
# Original image
axes[0].imshow(sample_image)
axes[0].set_title('Original', fontsize=10)
axes[0].axis('off')
# Generate augmented versions
it = datagen.flow(sample_batch, batch_size=1)
for i in range(1, 12):
batch = next(it)
image = batch[0]
axes[i].imshow(image)
axes[i].set_title(f'Augmented {i}', fontsize=10)
axes[i].axis('off')
plt.suptitle('Data Augmentation Examples', fontsize=14)
plt.tight_layout()
plt.show()
def create_preprocessing_pipeline(self, input_shape=(224, 224, 3)):
"""Create complete preprocessing pipeline"""
def preprocess_image(image, label):
"""Preprocessing function for tf.data"""
# Resize
image = tf.image.resize(image, input_shape[:2])
# Normalize to [0, 1]
image = tf.cast(image, tf.float32) / 255.0
# Standardize (ImageNet statistics)
mean = tf.constant([0.485, 0.456, 0.406])
std = tf.constant([0.229, 0.224, 0.225])
image = (image - mean) / std
return image, label
return preprocess_image
def visualize_preprocessing_effects(self):
"""Show effects of different preprocessing steps"""
# Create sample image
image = np.random.randint(0, 255, (128, 128, 3), dtype=np.uint8)
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
# Original
axes[0, 0].imshow(image)
axes[0, 0].set_title('Original')
axes[0, 0].axis('off')
# Resized
resized = cv2.resize(image, (64, 64))
axes[0, 1].imshow(resized)
axes[0, 1].set_title('Resized (64x64)')
axes[0, 1].axis('off')
# Normalized [0, 1]
normalized = image.astype(np.float32) / 255.0
axes[0, 2].imshow(normalized)
axes[0, 2].set_title('Normalized [0, 1]')
axes[0, 2].axis('off')
# Standardized (ImageNet)
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
standardized = (normalized - mean) / std
# Clip for visualization
standardized_vis = np.clip(standardized, 0, 1)
axes[0, 3].imshow(standardized_vis)
axes[0, 3].set_title('Standardized')
axes[0, 3].axis('off')
# Grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
axes[1, 0].imshow(gray, cmap='gray')
axes[1, 0].set_title('Grayscale')
axes[1, 0].axis('off')
# Edge detection
edges = cv2.Canny(gray, 50, 150)
axes[1, 1].imshow(edges, cmap='gray')
axes[1, 1].set_title('Edge Detection')
axes[1, 1].axis('off')
# Histogram equalization
equalized = cv2.equalizeHist(gray)
axes[1, 2].imshow(equalized, cmap='gray')
axes[1, 2].set_title('Histogram Equalized')
axes[1, 2].axis('off')
# Gaussian blur
blurred = cv2.GaussianBlur(image, (15, 15), 0)
axes[1, 3].imshow(blurred)
axes[1, 3].set_title('Gaussian Blur')
axes[1, 3].axis('off')
plt.suptitle('Image Preprocessing Techniques', fontsize=14)
plt.tight_layout()
plt.show()
# Data augmentation
augmentation = DataAugmentationPipeline()
print("\n" + "="*60)
print("DATA AUGMENTATION AND PREPROCESSING")
print("="*60)
print("\nDemonstrating data augmentation:")
augmentation.demonstrate_augmentation()
print("\nVisualizing preprocessing effects:")
augmentation.visualize_preprocessing_effects()
# Create augmentation layers
tf_augmentation = augmentation.create_tf_augmentation()
print("\nTensorFlow augmentation layers created:")
class TransferLearning:
"""Transfer learning with pre-trained models"""
def __init__(self):
self.models = {}
def create_transfer_model(self, base_model_name='VGG16',
input_shape=(224, 224, 3),
num_classes=10,
trainable_layers=2):
"""Create transfer learning model"""
# Load pre-trained model
if base_model_name == 'VGG16':
base_model = VGG16(input_shape=input_shape,
include_top=False,
weights='imagenet')
elif base_model_name == 'ResNet50':
base_model = ResNet50(input_shape=input_shape,
include_top=False,
weights='imagenet')
elif base_model_name == 'MobileNetV2':
base_model = MobileNetV2(input_shape=input_shape,
include_top=False,
weights='imagenet')
else:
raise ValueError(f"Unknown model: {base_model_name}")
# Freeze base model layers
base_model.trainable = False
# Unfreeze last few layers for fine-tuning
if trainable_layers > 0:
for layer in base_model.layers[-trainable_layers:]:
layer.trainable = True
# Build model
model = keras.Sequential([
base_model,
layers.GlobalAveragePooling2D(),
layers.Dense(256, activation='relu'),
layers.BatchNormalization(),
layers.Dropout(0.5),
layers.Dense(128, activation='relu'),
layers.BatchNormalization(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation='softmax')
])
return model, base_model
def compare_models(self):
"""Compare different pre-trained models"""
models_to_compare = ['VGG16', 'ResNet50', 'MobileNetV2']
results = {}
for model_name in models_to_compare:
model, base = self.create_transfer_model(model_name)
# Count parameters
total_params = model.count_params()
trainable_params = sum([tf.size(w).numpy()
for w in model.trainable_weights])
results[model_name] = {
'total_params': total_params,
'trainable_params': trainable_params,
'base_layers': len(base.layers),
'total_layers': len(model.layers)
}
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Parameters comparison
model_names = list(results.keys())
total_params = [results[m]['total_params'] for m in model_names]
trainable_params = [results[m]['trainable_params'] for m in model_names]
x = np.arange(len(model_names))
width = 0.35
axes[0, 0].bar(x - width/2, np.array(total_params)/1e6, width,
label='Total', color='lightblue')
axes[0, 0].bar(x + width/2, np.array(trainable_params)/1e6, width,
label='Trainable', color='orange')
axes[0, 0].set_xlabel('Model')
axes[0, 0].set_ylabel('Parameters (Millions)')
axes[0, 0].set_title('Model Parameters')
axes[0, 0].set_xticks(x)
axes[0, 0].set_xticklabels(model_names)
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3, axis='y')
# Layers comparison
base_layers = [results[m]['base_layers'] for m in model_names]
axes[0, 1].bar(model_names, base_layers, color='steelblue')
axes[0, 1].set_ylabel('Number of Layers')
axes[0, 1].set_title('Base Model Layers')
axes[0, 1].grid(True, alpha=0.3, axis='y')
# Model characteristics
characteristics = {
'VGG16': {'Depth': 'Deep', 'Width': 'Wide', 'Year': 2014},
'ResNet50': {'Depth': 'Very Deep', 'Width': 'Medium', 'Year': 2015},
'MobileNetV2': {'Depth': 'Medium', 'Width': 'Narrow', 'Year': 2018}
}
# Create table
table_data = []
for model_name in model_names:
char = characteristics[model_name]
params = results[model_name]
table_data.append([
model_name,
f"{params['total_params']/1e6:.1f}M",
f"{params['base_layers']}",
char['Year']
])
table = axes[1, 0].table(cellText=table_data,
colLabels=['Model', 'Params', 'Layers', 'Year'],
cellLoc='center',
loc='center')
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 1.5)
axes[1, 0].axis('off')
axes[1, 0].set_title('Model Comparison')
# Training tips
tips_text = """Transfer Learning Best Practices:
1. Start with frozen base model
2. Train only classifier head
3. Unfreeze top layers gradually
4. Use lower learning rate for base
5. Monitor for overfitting
6. Use appropriate preprocessing
7. Consider model size vs accuracy
"""
axes[1, 1].text(0.1, 0.5, tips_text, fontsize=10,
verticalalignment='center', family='monospace')
axes[1, 1].set_title('Tips')
axes[1, 1].axis('off')
plt.suptitle('Transfer Learning Model Comparison', fontsize=14)
plt.tight_layout()
plt.show()
return results
def fine_tuning_strategy(self):
"""Demonstrate fine-tuning strategy"""
print("\nFine-tuning Strategy:")
print("-" * 40)
strategy = """
PHASE 1: Feature Extraction (5-10 epochs)
- Freeze entire base model
- Train only classifier head
- Use higher learning rate (0.001)
- Monitor validation accuracy
PHASE 2: Fine-tuning (10-20 epochs)
- Unfreeze top layers of base model
- Use lower learning rate (0.0001)
- Use differential learning rates
- Watch for overfitting
PHASE 3: Full Fine-tuning (optional)
- Unfreeze entire model
- Very low learning rate (0.00001)
- Early stopping essential
- Only if sufficient data
"""
print(strategy)
# Create example with different phases
base_model = VGG16(input_shape=(224, 224, 3),
include_top=False,
weights='imagenet')
# Phase 1: Freeze all
base_model.trainable = False
print(f"\nPhase 1: {sum([layer.trainable for layer in base_model.layers])} trainable layers")
# Phase 2: Unfreeze top layers
base_model.trainable = True
for layer in base_model.layers[:-4]:
layer.trainable = False
print(f"Phase 2: {sum([layer.trainable for layer in base_model.layers])} trainable layers")
# Phase 3: Unfreeze all
base_model.trainable = True
print(f"Phase 3: {sum([layer.trainable for layer in base_model.layers])} trainable layers")
# Transfer learning
transfer = TransferLearning()
print("\n" + "="*60)
print("TRANSFER LEARNING")
print("="*60)
print("\nComparing pre-trained models:")
model_comparison = transfer.compare_models()
print("\nFine-tuning strategy:")
transfer.fine_tuning_strategy()
class CNNVisualization:
"""Visualize CNN features and activations"""
def __init__(self):
self.visualizations = {}
def visualize_feature_maps(self, model, image, layer_names=None):
"""Visualize intermediate feature maps"""
if layer_names is None:
# Get first few conv layers
layer_names = [layer.name for layer in model.layers
if 'conv' in layer.name][:3]
# Create model that outputs intermediate layers
layer_outputs = [model.get_layer(name).output for name in layer_names]
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)
# Get activations
if len(image.shape) == 3:
image = np.expand_dims(image, axis=0)
activations = activation_model.predict(image, verbose=0)
# Plot feature maps
for layer_name, activation in zip(layer_names, activations):
n_features = min(activation.shape[-1], 16) # Show max 16 features
size = activation.shape[1]
fig, axes = plt.subplots(4, 4, figsize=(10, 10))
axes = axes.flatten()
for i in range(n_features):
axes[i].imshow(activation[0, :, :, i], cmap='viridis')
axes[i].set_title(f'Feature {i}', fontsize=8)
axes[i].axis('off')
for i in range(n_features, 16):
axes[i].axis('off')
plt.suptitle(f'Feature Maps: {layer_name} (Shape: {activation.shape[1:]})',
fontsize=12)
plt.tight_layout()
plt.show()
def create_cam_heatmap(self, model, image, pred_index=None):
"""Create Class Activation Map (CAM)"""
# Get last conv layer
last_conv_layer = None
for layer in reversed(model.layers):
if isinstance(layer, layers.Conv2D):
last_conv_layer = layer
break
if last_conv_layer is None:
print("No convolutional layer found")
return None
# Create model that maps input to activations of last conv layer
last_conv_model = keras.Model(model.inputs, last_conv_layer.output)
# Create model that maps from last conv to predictions
classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer in model.layers[model.layers.index(last_conv_layer) + 1:]:
x = layer(x)
classifier_model = keras.Model(classifier_input, x)
# Get gradient of prediction with respect to last conv layer
with tf.GradientTape() as tape:
if len(image.shape) == 3:
image = np.expand_dims(image, axis=0)
last_conv_output = last_conv_model(image)
tape.watch(last_conv_output)
preds = classifier_model(last_conv_output)
if pred_index is None:
pred_index = tf.argmax(preds[0])
class_channel = preds[:, pred_index]
# Calculate gradients
grads = tape.gradient(class_channel, last_conv_output)
# Pool gradients
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
# Weight feature maps by gradients
last_conv_output = last_conv_output[0]
heatmap = last_conv_output @ pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
# Normalize heatmap
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
return heatmap.numpy()
def visualize_predictions_with_cam(self, model, images, labels=None):
"""Visualize predictions with CAM overlay"""
n_images = min(len(images), 6)
fig, axes = plt.subplots(2, n_images, figsize=(3*n_images, 6))
for i in range(n_images):
img = images[i]
# Predict
if len(img.shape) == 3:
img_batch = np.expand_dims(img, axis=0)
else:
img_batch = img
pred = model.predict(img_batch, verbose=0)
pred_class = np.argmax(pred[0])
pred_prob = pred[0, pred_class]
# Generate CAM
heatmap = self.create_cam_heatmap(model, img, pred_class)
# Display original
axes[0, i].imshow(img if len(img.shape) == 3 else img.squeeze(),
cmap='gray' if len(img.shape) == 2 else None)
title = f'Pred: {pred_class} ({pred_prob:.2f})'
if labels is not None:
title = f'True: {labels[i]}\n' + title
axes[0, i].set_title(title, fontsize=8)
axes[0, i].axis('off')
# Display with CAM overlay
if heatmap is not None:
# Resize heatmap
heatmap_resized = cv2.resize(heatmap,
(img.shape[1], img.shape[0]))
# Overlay
axes[1, i].imshow(img if len(img.shape) == 3 else img.squeeze(),
cmap='gray' if len(img.shape) == 2 else None)
axes[1, i].imshow(heatmap_resized, cmap='jet', alpha=0.5)
axes[1, i].set_title('CAM Heatmap', fontsize=8)
axes[1, i].axis('off')
plt.suptitle('Predictions with Class Activation Maps', fontsize=12)
plt.tight_layout()
plt.show()
# CNN Visualization
viz = CNNVisualization()
print("\n" + "="*60)
print("CNN VISUALIZATION")
print("="*60)
# Create sample images
sample_images = np.random.rand(6, 32, 32, 3).astype(np.float32)
print("\nVisualization techniques demonstrated")
print("(Feature maps and CAM require trained model with real data)")
print("\n" + "="*60)
print("CNN BEST PRACTICES")
print("="*60)
best_practices = """
KEY GUIDELINES:
1. ARCHITECTURE DESIGN:
• Start with proven architectures
• Use batch normalization after conv layers
• Add dropout for regularization (0.2-0.5)
• Consider skip connections for deep networks
• Use global average pooling instead of flatten
2. CONVOLUTION LAYERS:
• 3x3 filters are most common
• Increase filters as you go deeper (32→64→128)
• Use 'same' padding to preserve dimensions
• Stride 2 for downsampling (instead of pooling)
3. POOLING:
• Max pooling for feature detection
• Average pooling for smoother downsampling
• Don't pool too aggressively early
• Consider strided convolutions instead
4. DATA AUGMENTATION:
• Essential for small datasets
• Rotation, flipping, zooming, shifting
• Color/brightness adjustments
• Don't augment validation/test data
• Use appropriate augmentation for domain
5. TRANSFER LEARNING:
• Start with pre-trained models
• Fine-tune carefully (low learning rate)
• Freeze early layers initially
• Match preprocessing to original training
6. TRAINING TIPS:
• Use appropriate input size (224x224 common)
• Normalize inputs properly
• Start with Adam optimizer
• Use learning rate scheduling
• Monitor for overfitting
7. COMMON PROBLEMS:
Overfitting:
• More augmentation
• Stronger regularization
• Simpler architecture
• Transfer learning
Slow Training:
• Reduce input size
• Use simpler architecture
• Mixed precision training
• Better hardware (GPU)
Poor Accuracy:
• Check preprocessing
• Verify labels are correct
• Try different architecture
• Increase model capacity
"""
print(best_practices)
# Architecture evolution
architecture_evolution = """
CNN ARCHITECTURE EVOLUTION:
LeNet-5 (1998):
• 2 Conv + 2 Pool + 2 FC
• 60K parameters
• Digits recognition
AlexNet (2012):
• 5 Conv + 3 FC
• 60M parameters
• ReLU, Dropout
• ImageNet winner
VGGNet (2014):
• 16-19 layers
• 3x3 convolutions only
• 138M parameters
• Simple, uniform
GoogLeNet/Inception (2014):
• 22 layers deep
• Inception modules
• 7M parameters
• Multi-scale processing
ResNet (2015):
• 50-152 layers
• Skip connections
• 25M parameters
• Solved vanishing gradient
DenseNet (2017):
• Dense connections
• Feature reuse
• Parameter efficient
EfficientNet (2019):
• Compound scaling
• Neural architecture search
• State-of-the-art accuracy
• Mobile-friendly
"""
print(architecture_evolution)
Design and implement your own CNN:
Extend CNN for object detection:
Build semantic segmentation model: