Science is the systematic classification of experience ~George Henry Lewes

Let's bring a bit of fun out of ML today, by classifying images into Dogs and Cats. For that, we'll be using a Deep Neural Network(*a neural net that has more than one hidden layer. You can read about it* *here**.*)

## You need a Dataset

I have uploaded a dataset with images of cats and dogs, which is provided by Tensorflow, here. You can download it and get started.

## What is an Image?

An image is a matrix arrangement of colors. Consider an example,

zooming a bit,

your image has such kind of boxes, where each box is a color of three values- Red, Green, and Blue. These three values range from 0 to 255. so, if we have an image of 10x10 pixels, we get a total of 100 pixels which has 3 values each, i.e 300 values for a small picture of 10x10(*youtube has a maximum resolution of 3840x2160 pixels which is 24883200 values*). We, humans can't depict an image by just reading the numbers, but machines can! and hence, we extract the features of an image by its colors and feed it to a neural net.

## Designing the Model,

Before we design the model, let's define our variables,

```
train_data_dir = "../Datasets/catsanddogs_tensorflow/train/"
test_data_dir = "../Datasets/catsanddogs_tensorflow/validation/"
width, height = 150, 150
shape = (width, height, 3)
batch_size = 32
epochs = 10
```

you can replace the above path with the path of your dataset. `Width`

and `Height`

are the desired width and height we provide to our model, which must be unique. `shape`

has an extra param `3`

because the pixels have 3 values: `Red`

, `Green`

, and `Blue`

. For Grayscale images, we pass the dimension to be `1`

. `batch_size`

is the number of batches we divide the images for training(*we can't process thousands of images at once 🥲*). `epochs`

is the number of times we circulate through the data, which is `10`

in this case.

### Pre-Processing

We have many images of different dimensions, which is not acceptable by a Neural Network. Neural Net must be fed with the data of unique dimensions, so we preprocess the images to reduce the size to our desired size. We can also transform images without saving them(*which is a feature provided by TensorFlow*) through which we have more data to train, leading to increased accuracy.

And, we know that each pixel has three values ranging from 0 to 255. Models with such large values take longer to train and need heavy processing power. Hence, we scale them to fall between 0 and 1 as follows,

```
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1.0 / 255,
shear_range=0.2, #Rotating the images
zoom_range=0.2, #Zooming the images
horizontal_flip=True, #Images are flipped horizontally
)
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1.0 / 255, #Modification isn't required for Test Data.
)
```

`shear_range`

deals with the amount of rotation, and `zoom_range`

deals with the amount of zoom. There are many such parameters, and you can read about them here.### Generating Image Data

now that we define how to process our images, we need to pass the directory of our images to the generator,

```
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(width, height),
batch_size=batch_size,
class_mode="binary", #it is binary because we only have 2 classes: Dogs and Cats
)
test_generator = test_datagen.flow_from_directory(
test_data_dir,
target_size=(width, height),
batch_size=batch_size,
class_mode="binary", #it is binary because we only have 2 classes: Dogs and Cats
)
```

in the generator, images are scaled to a size of `150,150`

, images are fed at a rate of `32`

, at a time.

`binary`

because we only have two classifications: `cat`

and `dog`

. For multiple classifications, we pass `categorical`

.Now that we have generated images that suit our training, let's build out Neural Network.

### Building the Neural Network,

Our neural network is defined as follows,

```
model = tf.keras.Sequential(
[
tf.keras.layers.Conv2D(32, (3, 3), activation="relu", input_shape=shape),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation="relu"),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation="sigmoid"),
]
)
```

Woooaaaaahhh! That's a bunch of Layers!, don't be afraid, It's simple when we break it into smaller chunks. Let's talk about Activation Functions later, and understand the layers first.

`Conv2D`

is a Convolutional Layer, which extracts the features from the image by applying specified filters to the image(*which are 32, 64, and 128 in this case*). `(3,3)`

is the size of the filter kernel. The first layer being the input layer, accepts a shape of `(150, 150, 3)`

.

the image depicts feature extraction using an algorithm called `Convolution`

.

`MaxPooling2D`

is a data compressor, which takes a window of `2x2`

(*in this case*), selects the maximum value in the window, which in turn condenses the image.

`Flatten`

is a layer that will flatten the data from any dimension to 1-dimension, in turn, it becomes easier for `Dense`

layer to operate.

`Dense`

layer connects each neuron in its layer to every neuron in the previous layer. It takes all the inputs from the previous layer, applies weights to them, sums them up, adds a bias term, and then passes the result through an activation function. The output of a dense layer is a set of values that represent the learned features and relationships in the data, which can be used for making predictions or further processing in the neural network. `512 and 1`

is the number of neurons in the dense layer in this case.

`Dropout`

layer drops random outputs from the previous layer to prevent over-fitting and to maintain the robustness of the data(*the dropping of data follows no pattern, it happens randomly at each iteration*).

### Activation Functions

`ReLU`

stands for`Rectified Linear Unit`

is an activation function, which returns`0`

if the output value is negative, and returns`1`

if the output value is zero or positive.`Sigmoid`

is an activation function that takes any real-valued number as input and squashes it to a range between 0 and 1. Specifically, positive values are mapped close to 1, negative values are mapped close to 0, and the value 0 is mapped to exactly 0.5. This property makes sigmoid suitable for binary classification problems.

### Compiling and Training

we compile the model as follows,

```
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
```

`adam`

*short for Adaptive Moment Estimation*is an optimization algorithm used to update the weights of a neural network during the training process.`binary_crossentropy`

is a specific loss function used especially in binary classification tasks. It is commonly employed when you have a binary output (two classes) and want to compare the predicted probabilities to the true binary labels. In binary classification, the goal is to predict whether an input belongs to one class (usually represented as 1) or another class (usually represented as 0). The binary crossentropy loss measures the dissimilarity between the predicted probabilities and the true binary labels.

training the model,

```
history = model.fit(
train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs, #10 in this case
validation_data=test_generator,
validation_steps=len(test_generator),
)
```

`steps_per_epoch`

and `validation_steps`

specifies the number of batches to process from the data in each epoch. Here's what the graph looks like,

Yay!! you have successfully trained a Model on predicting the images of Cats vs. Dogs. You can refer to my notebook here to know how to predict custom images from your gallery.

Until next time, Sree Teja Dusi.