ML | Getting Started With AlexNet

Last Updated : 12 May, 2026

AlexNet is a deep convolutional neural network used for image classification. It consists of multiple convolutional and fully connected layers designed to extract features and perform classification efficiently. It's features are:

  • ReLU activation enables faster training and better gradient flow.
  • Dropout reduces overfitting in fully connected layers.
  • Data augmentation helps in improving model generalization on image data.

Architecture

  • 5 convolutional layers with max pooling after the 1st, 2nd, and 5th layers.
  • Overlapping max pooling (3×3 filter, stride 2) improves performance.
  • 2 fully connected layers with dropout for regularization.
  • Softmax layer for final classification output.

Implementation

1. Importing Libraries

Import libraries like

Python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

2. Loading and Preprocessing CIFAR-10 Dataset

  • CIFAR-10 contains 60,000 32×32 RGB images across 10 classes.
  • Pixel values are scaled to [0, 1].
  • Labels are one-hot encoded for softmax classification.
Python
# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode the labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

3. Defining the AlexNet Model (Adjusted for CIFAR-10)

  • Adapted for CIFAR-10: Handles 32×32 images with 10 output classes.
  • Reduced FC layers: Prevents overfitting on small datasets.
  • Uses ReLU, Dropout, BatchNorm and softmax in the final layer.
Python
model = Sequential()

# Layer 1
model.add(Conv2D(96, kernel_size=(3,3), strides=(1,1), input_shape=(32,32,3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(BatchNormalization())

# Layer 2
model.add(Conv2D(256, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(BatchNormalization())

# Layer 3
model.add(Conv2D(384, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))

# Layer 4
model.add(Conv2D(384, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))

# Layer 5
model.add(Conv2D(256, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))

# Flatten
model.add(Flatten())

# Fully Connected Layer 1
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))

# Fully Connected Layer 2
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))

# Output Layer
model.add(Dense(10))
model.add(Activation('softmax'))

4. Compiling the Model

Using adam optimizer and categorical_crossentropy for multi-class classification.

Python
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

5. Training the Model

  • Train for 15 epochs, with 20% validation split.
  • You can increase epochs for better accuracy.
Python
history = model.fit(x_train, y_train,
                    batch_size=128,
                    epochs=15,
                    validation_split=0.2,
                    verbose=1)

Output:

Screenshot-2025-07-03-120714
Training

6. Evaluating the Model

Evaluates the trained model on test data to measure accuracy and performance.

Python
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f'Test Accuracy: {test_acc:.4f}')

Output:

Test Accuracy: 0.7387

7. Plotting Training & Validation Accuracy

Plots training and validation accuracy to visualize model performance over epochs.

Python
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('AlexNet on CIFAR-10 (GPU)')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

Output:

AlexNet
AlexNet on CIFAR-10

Advantages

  • Uses ReLU activation for faster training compared to traditional tanh/sigmoid.
  • Applies dropout to reduce overfitting during training.
  • Utilizes GPU-based parallel computation for faster processing.
  • Uses overlapping max pooling to improve generalization and performance.

Disadvantages

  • Has a large number of parameters, making it memory-intensive.
  • Requires high computational resources for training.
  • Lacks modular and automated architecture design.
  • Tends to overfit on small datasets.
  • Does not include modern architectural improvements.

Applications

  • Used for image classification of objects in images.
  • Acts as a feature extractor for transfer learning tasks.
  • Serves as a backbone for object detection models.
  • Applied in medical imaging for detecting abnormalities.
  • Used in facial recognition and emotion detection systems.
  • Helps in identifying objects in autonomous driving systems.
Comment