Image Classification

Image Classification

Cats vs. Dogs

Science is the systematic classification of experience ~George Henry Lewes

Let's bring a bit of fun out of ML today, by classifying images into Dogs and Cats. For that, we'll be using a Deep Neural Network(a neural net that has more than one hidden layer. You can read about it here.)

💡
We'll be using TensorFlow for Python in this tutorial (just a few lines of code).

You need a Dataset

I have uploaded a dataset with images of cats and dogs, which is provided by Tensorflow, here. You can download it and get started.

What is an Image?

An image is a matrix arrangement of colors. Consider an example,

zooming a bit,

your image has such kind of boxes, where each box is a color of three values- Red, Green, and Blue. These three values range from 0 to 255. so, if we have an image of 10x10 pixels, we get a total of 100 pixels which has 3 values each, i.e 300 values for a small picture of 10x10(youtube has a maximum resolution of 3840x2160 pixels which is 24883200 values). We, humans can't depict an image by just reading the numbers, but machines can! and hence, we extract the features of an image by its colors and feed it to a neural net.

Designing the Model,

Before we design the model, let's define our variables,

train_data_dir = "../Datasets/catsanddogs_tensorflow/train/"
test_data_dir = "../Datasets/catsanddogs_tensorflow/validation/"
width, height = 150, 150
shape = (width, height, 3)
batch_size = 32
epochs = 10

you can replace the above path with the path of your dataset. Width and Height are the desired width and height we provide to our model, which must be unique. shape has an extra param 3 because the pixels have 3 values: Red, Green, and Blue. For Grayscale images, we pass the dimension to be 1. batch_size is the number of batches we divide the images for training(we can't process thousands of images at once 🥲). epochs is the number of times we circulate through the data, which is 10 in this case.

Pre-Processing

We have many images of different dimensions, which is not acceptable by a Neural Network. Neural Net must be fed with the data of unique dimensions, so we preprocess the images to reduce the size to our desired size. We can also transform images without saving them(which is a feature provided by TensorFlow) through which we have more data to train, leading to increased accuracy.

And, we know that each pixel has three values ranging from 0 to 255. Models with such large values take longer to train and need heavy processing power. Hence, we scale them to fall between 0 and 1 as follows,

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1.0 / 255,
    shear_range=0.2, #Rotating the images
    zoom_range=0.2, #Zooming the images
    horizontal_flip=True, #Images are flipped horizontally
)

test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1.0 / 255, #Modification isn't required for Test Data.
)
💡
shear_range deals with the amount of rotation, and zoom_range deals with the amount of zoom. There are many such parameters, and you can read about them here.

Generating Image Data

now that we define how to process our images, we need to pass the directory of our images to the generator,

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(width, height),
    batch_size=batch_size,
    class_mode="binary", #it is binary because we only have 2 classes: Dogs and Cats
)

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(width, height),
    batch_size=batch_size,
    class_mode="binary", #it is binary because we only have 2 classes: Dogs and Cats
)

in the generator, images are scaled to a size of 150,150, images are fed at a rate of 32, at a time.

💡
The class mode is binary because we only have two classifications: cat and dog. For multiple classifications, we pass categorical.

Now that we have generated images that suit our training, let's build out Neural Network.

Building the Neural Network,

Our neural network is defined as follows,

model = tf.keras.Sequential(
    [
        tf.keras.layers.Conv2D(32, (3, 3), activation="relu", input_shape=shape),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation="relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

Woooaaaaahhh! That's a bunch of Layers!, don't be afraid, It's simple when we break it into smaller chunks. Let's talk about Activation Functions later, and understand the layers first.

Conv2D is a Convolutional Layer, which extracts the features from the image by applying specified filters to the image(which are 32, 64, and 128 in this case). (3,3) is the size of the filter kernel. The first layer being the input layer, accepts a shape of (150, 150, 3).

Image Classification with Convolution Neural Networks (CNN) With Keras |  Official Pythian®® Blog

the image depicts feature extraction using an algorithm called Convolution.

💡
Specifying shape for every layer is not necessary, as the data of the same shape from 1st layer is passed to every layer.

MaxPooling2D is a data compressor, which takes a window of 2x2(in this case), selects the maximum value in the window, which in turn condenses the image.

Max Pooling Layer | NumPyNet

Flatten is a layer that will flatten the data from any dimension to 1-dimension, in turn, it becomes easier for Dense layer to operate.

Convolutional Neural Network Tutorial [Update]

Dense layer connects each neuron in its layer to every neuron in the previous layer. It takes all the inputs from the previous layer, applies weights to them, sums them up, adds a bias term, and then passes the result through an activation function. The output of a dense layer is a set of values that represent the learned features and relationships in the data, which can be used for making predictions or further processing in the neural network. 512 and 1 is the number of neurons in the dense layer in this case.

Dropout layer drops random outputs from the previous layer to prevent over-fitting and to maintain the robustness of the data(the dropping of data follows no pattern, it happens randomly at each iteration).

Activation Functions

  • ReLU stands for Rectified Linear Unit is an activation function, which returns 0 if the output value is negative, and returns 1 if the output value is zero or positive.

  • Sigmoid is an activation function that takes any real-valued number as input and squashes it to a range between 0 and 1. Specifically, positive values are mapped close to 1, negative values are mapped close to 0, and the value 0 is mapped to exactly 0.5. This property makes sigmoid suitable for binary classification problems.

Compiling and Training

we compile the model as follows,

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
  • adam short for Adaptive Moment Estimation is an optimization algorithm used to update the weights of a neural network during the training process.

  • binary_crossentropy is a specific loss function used especially in binary classification tasks. It is commonly employed when you have a binary output (two classes) and want to compare the predicted probabilities to the true binary labels. In binary classification, the goal is to predict whether an input belongs to one class (usually represented as 1) or another class (usually represented as 0). The binary crossentropy loss measures the dissimilarity between the predicted probabilities and the true binary labels.

training the model,

history = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=epochs, #10 in this case
    validation_data=test_generator,
    validation_steps=len(test_generator),
)

steps_per_epoch and validation_steps specifies the number of batches to process from the data in each epoch. Here's what the graph looks like,

Yay!! you have successfully trained a Model on predicting the images of Cats vs. Dogs. You can refer to my notebook here to know how to predict custom images from your gallery.

Bye bye Minions | Minions, Bye gif, Bye bye

Until next time, Sree Teja Dusi.

Did you find this article valuable?

Support Sree Teja Dusi by becoming a sponsor. Any amount is appreciated!