The world of artificial intelligence (AI) and deep learning is advancing remarkably, offering new ways to approach problems that were once considered too complex. One fascinating application is in the domain of food classification, specifically in identifying various dishes from India's diverse culinary landscape.
In this article, we'll walk through creating a deep learning model to classify Indian cuisine, using advanced techniques such as transfer learning, fine-tuning, and hyperparameter optimization.
Table of Content
Why to classify Indian Cuisine?
India's rich culinary heritage and its myriad of regional dishes present a unique challenge for classification models. The diversity in ingredients, cooking methods, and presentation styles across different regions such as North, South, East, and West India makes this task both challenging and intriguing. Building a model that can accurately classify these dishes is not just an exercise in AI but a celebration of India's gastronomic diversity.
Indian Cuisine Classification Using Deep Learning
In this section, we are going to perform the classification using the Indian Cuisine Dataset. Let's break down the code into steps, provide an explanation for each step, and include the respective code snippets.
Step 1: Mount Google Drive
This step is necessary if you're using Google Colab to access files stored on Google Drive.
from google.colab import drive
drive.mount('/content/drive')
Step 2: Removing Corrupted Images
This step verifies the images in the dataset to ensure they are not corrupted. If an image is found to be corrupted, it is removed from the dataset.
import os
from PIL import Image
def check_images(directory):
for root, _, files in os.walk(directory):
for file in files:
file_path = os.path.join(root, file)
try:
img = Image.open(file_path)
img.verify()
except (IOError, SyntaxError) as e:
print(f"Bad file: {file_path}")
os.remove(file_path)
dataset_path = '/content/drive/MyDrive/food_dataset'
check_images(dataset_path)
Step 3: Data Preprocessing
This step involves setting up the data generators that will be used to feed images into the model during training. The images are rescaled, and the data is split into training and validation sets.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.3
)
train_gen = train_datagen.flow_from_directory(
dataset_path,
target_size=(224, 224),
class_mode='sparse',
batch_size=32,
subset='training',
shuffle=True
)
validation_gen = train_datagen.flow_from_directory(
dataset_path,
target_size=(224, 224),
class_mode='sparse',
batch_size=32,
subset='validation',
shuffle=True
)
print(train_gen.class_indices)
print(validation_gen.class_indices)
Output:
{'Aloo Puri': 0, 'Bhindi Masala': 1, 'Chhole Bhature': 2, 'Dal Bati Churma': 3, 'Dal Makhni': 4, 'Dhokla': 5, 'Gulab Jamun': 6, 'Idli Sambhar': 7, 'Jalebi': 8, 'Kheer': 9, 'Mushroom': 10, 'Paneer': 11, 'Pav Bhaji': 12, 'Poha': 13, 'Rajma Chawal': 14, 'Rasgulla': 15, 'Rasmalai': 16, 'Sarson ka Saag Makki ki Roti': 17, 'Thepla': 18, 'Vada Pav': 19}
{'Aloo Puri': 0, 'Bhindi Masala': 1, 'Chhole Bhature': 2, 'Dal Bati Churma': 3, 'Dal Makhni': 4, 'Dhokla': 5, 'Gulab Jamun': 6, 'Idli Sambhar': 7, 'Jalebi': 8, 'Kheer': 9, 'Mushroom': 10, 'Paneer': 11, 'Pav Bhaji': 12, 'Poha': 13, 'Rajma Chawal': 14, 'Rasgulla': 15, 'Rasmalai': 16, 'Sarson ka Saag Makki ki Roti': 17, 'Thepla': 18, 'Vada Pav': 19}
Step 4: Model Creation with Transfer Learning
In this step, a pre-trained model (MobileNetV2) is used as the base model. The layers of this model are frozen initially to retain the learned features. A custom classification head is added to the model for the specific task of classifying 20 different classes of food.
from tensorflow.keras import layers, models
from tensorflow.keras.applications import MobileNetV2
reference_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freezing the layers
for layer in reference_model.layers:
layer.trainable = False
model = models.Sequential([
reference_model,
layers.Flatten(),
layers.BatchNormalization(),
layers.Dense(units=128, activation='relu'),
layers.BatchNormalization(),
layers.Dropout(0.2),
layers.Dense(units=20, activation='softmax')
])
Step 5: Model Compilation and Training
The model is compiled with an Adam optimizer, a loss function for categorical classification, and accuracy as the evaluation metric. Early stopping is used to prevent overfitting. Then, the model is trained on the training data and validated on the validation data.
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
checkpoint_callback = ModelCheckpoint(
filepath='/content/drive/MyDrive/saved_model/model_epoch_{epoch:02d}.h5',
monitor='val_accuracy',
save_best_only=True,
save_freq='epoch',
)
trained_model = model.fit(
train_gen,
epochs=10,
validation_data=validation_gen,
callbacks=[checkpoint_callback]
)
Output:
Epoch 1/10
30/85 [=========>....................] - ETA: 30s - loss: 1.6956 - accuracy: 0.5208/usr/local/lib/python3.10/dist-packages/PIL/Image.py:996: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
warnings.warn(
85/85 [==============================] - 83s 936ms/step - loss: 1.2460 - accuracy: 0.6422 - val_loss: 1.0757 - val_accuracy: 0.7273
Epoch 2/10
85/85 [==============================] - 69s 816ms/step - loss: 0.1729 - accuracy: 0.9774 - val_loss: 0.8590 - val_accuracy: 0.7552
Epoch 3/10
85/85 [==============================] - 69s 820ms/step - loss: 0.0577 - accuracy: 0.9978 - val_loss: 0.8410 - val_accuracy: 0.7552
Epoch 4/10
85/85 [==============================] - 70s 822ms/step - loss: 0.0266 - accuracy: 0.9993 - val_loss: 0.8282 - val_accuracy: 0.7657
Epoch 5/10
85/85 [==============================] - 68s 806ms/step - loss: 0.0185 - accuracy: 0.9996 - val_loss: 0.8367 - val_accuracy: 0.7649
Epoch 6/10
85/85 [==============================] - 70s 825ms/step - loss: 0.0109 - accuracy: 0.9996 - val_loss: 0.8370 - val_accuracy: 0.7649
Epoch 7/10
85/85 [==============================] - 70s 827ms/step - loss: 0.0106 - accuracy: 0.9996 - val_loss: 0.8666 - val_accuracy: 0.7605
Epoch 8/10
85/85 [==============================] - 70s 828ms/step - loss: 0.0095 - accuracy: 0.9996 - val_loss: 0.8606 - val_accuracy: 0.7561
Epoch 9/10
85/85 [==============================] - 68s 800ms/step - loss: 0.0052 - accuracy: 1.0000 - val_loss: 0.8555 - val_accuracy: 0.7570
Epoch 10/10
85/85 [==============================] - 68s 808ms/step - loss: 0.0072 - accuracy: 1.0000 - val_loss: 0.8684 - val_accuracy: 0.7579
Step 6: Fine-tuning the Model
After initial training, some or all layers of the base model are unfrozen, and the model is recompiled with a lower learning rate using the SGD optimizer. This process, called fine-tuning, helps improve the model's performance.
# Unfreezing some layers
for layer in reference_model.layers:
layer.trainable = True
model.compile(
optimizer=tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
Step 7: Saving the Final Model
After training, the final model is saved to Google Drive for later use.
saved_model = model.save('/content/drive/MyDrive/saved_model/final.keras')
Step 8: Loading the Model and Making Predictions
The saved model is loaded from Google Drive, and a function is defined to predict the class of a new image.
from keras.models import load_model
from keras.preprocessing import image
import numpy as np
# Function to predict images
def predict_image(image_path, saved_model):
img = image.load_img(image_path, target_size=(224, 224))
img = image.img_to_array(img)
img = img / 255.0
img = np.expand_dims(img, axis=0)
prediction = saved_model.predict(img)
classes = ['Aloo Puri', 'Bhindi Masala', 'Chhole Bhature', 'Dal Bati Churma', 'Dal Makhni', 'Dhokla', 'Gulab Jamun', 'Idli Sambhar', 'Jalebi', 'Kheer', 'Mushroom', 'Paneer', 'Pav Bhaji', 'Poha', 'Rajma Chawal', 'Rasgulla', 'Rasmalai', 'Sarson ka Saag Makki ki Roti', 'Thepla', 'Vada Pav']
predicted_class = classes[np.argmax(prediction)]
return predicted_class
saved_model = load_model('/content/drive/MyDrive/saved_model/final.keras')
image_path = '/content/drive/MyDrive/new.jfif'
predicted_class = predict_image(image_path, saved_model)
print(predicted_class)
Output:
JalebiStep 9: Model Evaluation
The model is evaluated on the validation dataset, and a classification report is generated to see how well the model performed.
from sklearn.metrics import classification_report
y_predict = model.predict(validation_gen)
y_true = validation_gen.classes
print(classification_report(y_true, y_predict.argmax(axis=1)))
Output:
precision recall f1-score support
0 0.03 0.02 0.03 46
1 0.06 0.07 0.06 72
2 0.07 0.08 0.07 79
3 0.08 0.08 0.08 91
4 0.08 0.07 0.07 46
5 0.09 0.09 0.09 97
6 0.04 0.04 0.04 51
7 0.07 0.07 0.07 90
8 0.02 0.03 0.02 77
9 0.05 0.05 0.05 60
10 0.03 0.02 0.02 63
11 0.09 0.12 0.10 100
12 0.00 0.00 0.00 12
13 0.00 0.00 0.00 4
14 0.09 0.09 0.09 91
16 0.00 0.00 0.00 1
17 0.06 0.06 0.06 90
18 0.08 0.08 0.08 74
accuracy 0.07 1144
macro avg 0.05 0.05 0.05 1144
weighted avg 0.06 0.07 0.06 1144
Step 10: Plotting Training and Validation Loss/Accuracy
This step involves plotting the training and validation loss and accuracy over the epochs to visualize the model's performance.
import matplotlib.pyplot as plt
training_loss = trained_model.history['loss']
test_loss = trained_model.history['val_loss']
training_acc = trained_model.history['accuracy']
test_acc = trained_model.history['val_accuracy']
epochs = range(len(training_loss))
plt.title('Loss vs Epoch')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.plot(epochs, training_loss, label='Training Loss')
plt.plot(epochs, test_loss, label='Validation Loss')
plt.legend()
plt.show()
Output:

Why MobileNetV2 Stood Out?
Among the three architectures tested—VGG16, ResNet50, and MobileNetV2—MobileNetV2 emerged as the best-performing model. Its efficient architecture, designed for resource-constrained environments, allowed it to achieve high accuracy while maintaining lower computational costs. This makes MobileNetV2 an excellent choice not only for large-scale classification tasks but also for deployment on devices with limited processing power.
Conclusion
The journey of building an Indian cuisine classification model is as much about understanding the intricacies of deep learning as it is about appreciating the diversity of Indian food. By employing advanced techniques like transfer learning, fine-tuning, and hyperparameter optimization, and by comparing different architectures, we were able to create a robust model capable of classifying a wide range of Indian dishes, with MobileNetV2 leading the way in performance.