2020, Mar 07

How to start classifying with TensorFlow 2.0

Intel Image Classification

1. Introduction

If you don't know how to start creating a classifier, and how to use TensorFlow, this article will be especially dedicated to solving this question.

We are going to use the dataset Intel Image Classification from Kaggle to do a tutorial for how to start with TensorFlow and how to create a classifier, looking for the best accuracy. This dataset contains images of Natural Scenes aroung the world and there are around 25K images distributed under 6 categories as we are going to see.

As an outline for this article, the following topics will be covered:

  • How to start with TensorFlow 2.0
  • CNN Model
  • Apply Data Augmentation techniques
  • Fine-Tuning with MobileNetV2

1.1. How to start with TensorFlow 2.0

What is TensorFlow?

TensorFlow is a machine learning framework that Google created to design, build, and train deep learning models. It has an excellent balance of flexibility and scalability.

TensorFlow consists of APIs at different levels. At the highest level, we have the Estimators that make it easy to develop a Machine Learning pipeline, and also we have Keras, a friendly API to create and train neural networks.


This is the API that we will use in this Blogspot. If you want to know more about how works TensorFlow you can check the documentation.

First of all, you have to install it and you will see some of the most common ways on the TensorFlow installation webpage. Once you have installed it, you can now import it to the workspace as you can see in the bellow cell.

In [1]:
# Imports
import tensorflow 
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, GlobalAveragePooling2D, MaxPool2D, Dense, Flatten, Dropout
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.applications import MobileNetV2
from keras.preprocessing import image
import os
import pathlib
import numpy as np
import matplotlib.pyplot as plt
Using TensorFlow backend.

1.2. Data Information

Now we are going to see how many images and classes we have in the dataset.

In [2]:
train_dir = pathlib.Path("/content/intel_image/seg_train/seg_train/")
test_dir = pathlib.Path("/content/intel_image/seg_test/seg_test/")
print("We have ", len(list(train_dir.glob('*/*.jpg'))), "images in train set.")
print("We have ", len(list(test_dir.glob('*/*.jpg'))), "images in test set.")
We have  14034 images in train set.
We have  3000 images in test set.
In [3]:
class_names = list([item.name for item in train_dir.glob('*')])
print("We have the following classes:", class_names)
We have the following classes: ['street', 'mountain', 'glacier', 'sea', 'buildings', 'forest']

1.3. Image Data Generator

As we said before, we are going to use the preprocessing tool from Keras. We will use the ImageDataGenerator that generate batches of tensor image data to load the dataset.

To use this generator function from Keras, we must have the images in a specific directory format:


The constructor for the ImageDataGenerator contains many arguments to specify how to manipulate the image data after it is loaded, but for now, we only configure it with the rescale parameter to convert from uint8 to float32 in the range [0,1].

Then, we have to call the _flow_from_directory()_ function to specify the dataset directory such as the train and test directory.

Also, we can configure more details such as:

  • _Target_size_ that allows us to load all images to a specific size, which in our case is going to be 150x150.
  • _Batchsize that is going to be 32, which means that 32 randomly selected images from across the classes in the dataset will be returned in each batch when training.
  • _Classmode to specify the type of classification task. In our case a multi-class classification: 'categorical'.
In [4]:
image_generator = ImageDataGenerator(rescale=1./255)

train_generator = image_generator.flow_from_directory(train_dir,
                                                      target_size = (150,150),

test_generator = image_generator.flow_from_directory(test_dir,
Found 14034 images belonging to 6 classes.
Found 3000 images belonging to 6 classes.

1.4 Data Exploration

We are going to create a simple function to show some random images of our dataset.

In [0]:
def show_batch(image_batch, label_batch):
    for n in range(10):
        ax = plt.subplot(5,5,n+1)
In [6]:
image_batch, label_batch = next(train_generator)
show_batch(image_batch, label_batch)

2. Creating a simple Neural Network

We are going to create a very simple neural network to understand how to train and evaluate it with TensorFlow.

Before starting, we have to understand the meaning of Early Stopping: it is a method that allows us to specify an arbitrarily large number of training epochs and stop training once the model performance stops improving on a held out validation dataset.

Keras supports the early stopping of training via a callback called EarlyStopping. So we are going to configure it with the next parameters: monitor that allows us to specify the performance measure to monitor in order to end training, and patience to define the number of epochs on which we would like to see no improvement.

In [0]:
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

Now, its time to build the model by stacking layers.

In [0]:
# Create model adding different layers.
# Using 'softmax' activation because we have a multiclass classification.
model = Sequential([Flatten(),
                    Dense(512, activation = 'relu'),
                    Dense(256, activation = 'relu'),
                    Dense(6, activation = 'softmax')])

# Compile the model
# loss: categorical_crossentropy because the targets are one-hot encoded.

Once we are here, fitting the model can be achieved by calling the next function passing the training and test set. Also, we can pass the verbose argument that allows us to discover the training epoch on which training was stopped.

In [9]:
# Fitting the model training with train test and validate test set.
# Callback: early_stopping
trained_MLP = model.fit(train_generator,
                    validation_data  = test_generator,
                    epochs = 50,
                    verbose = 1,
                    callbacks= [early_stopping]);

# Save weights
Epoch 1/50
439/439 [==============================] - 24s 54ms/step - loss: 3.1735 - accuracy: 0.2675 - val_loss: 1.5613 - val_accuracy: 0.3390
Epoch 2/50
439/439 [==============================] - 23s 53ms/step - loss: 1.6017 - accuracy: 0.3060 - val_loss: 1.4728 - val_accuracy: 0.4240
Epoch 3/50
439/439 [==============================] - 23s 53ms/step - loss: 1.5732 - accuracy: 0.3303 - val_loss: 1.5002 - val_accuracy: 0.3883
Epoch 4/50
439/439 [==============================] - 23s 53ms/step - loss: 1.5341 - accuracy: 0.3517 - val_loss: 1.4337 - val_accuracy: 0.4450
Epoch 5/50
439/439 [==============================] - 23s 53ms/step - loss: 1.5297 - accuracy: 0.3513 - val_loss: 1.4169 - val_accuracy: 0.4227
Epoch 6/50
439/439 [==============================] - 23s 53ms/step - loss: 1.5091 - accuracy: 0.3635 - val_loss: 1.4505 - val_accuracy: 0.4757
Epoch 7/50
439/439 [==============================] - 23s 53ms/step - loss: 1.4729 - accuracy: 0.3822 - val_loss: 1.4044 - val_accuracy: 0.4453
Epoch 8/50
439/439 [==============================] - 23s 54ms/step - loss: 1.4788 - accuracy: 0.3844 - val_loss: 1.3237 - val_accuracy: 0.4947
Epoch 9/50
439/439 [==============================] - 23s 53ms/step - loss: 1.4476 - accuracy: 0.3995 - val_loss: 1.3196 - val_accuracy: 0.4757
Epoch 10/50
439/439 [==============================] - 23s 53ms/step - loss: 1.4306 - accuracy: 0.4102 - val_loss: 1.4049 - val_accuracy: 0.4347
Epoch 11/50
439/439 [==============================] - 23s 52ms/step - loss: 1.4125 - accuracy: 0.4188 - val_loss: 1.3475 - val_accuracy: 0.4927
Epoch 12/50
439/439 [==============================] - 23s 52ms/step - loss: 1.4216 - accuracy: 0.4245 - val_loss: 1.3675 - val_accuracy: 0.5100

Now we are going to create a function to plot the accuracy and the loss over the epochs.

In [0]:
def plot_acc_loss(trained):
    fig, ax = plt.subplots(1, 2, figsize=(15,5))
    ax[0].plot(trained.epoch, trained.history["loss"], label="Train loss")
    ax[0].plot(trained.epoch, trained.history["val_loss"], label="Validation loss")
    ax[1].plot(trained.epoch, trained.history["accuracy"], label="Train acc")
    ax[1].plot(trained.epoch, trained.history["val_accuracy"], label="Validation acc")
In [11]:

The model.evaluate method checks the models performance, on the test set.

In [12]:
#Loading weights

# Evaluate the model with the test set.
model_MLP_score = model.evaluate(test_generator)
print("Model MLP Test Loss:", model_MLP_score[0])
print("Model MLP Test Accuracy:", model_MLP_score[1])
94/94 [==============================] - 4s 37ms/step - loss: 1.3675 - accuracy: 0.5100
Model MLP Test Loss: 1.3674591779708862
Model MLP Test Accuracy: 0.5099999904632568

3. CNN Model

Now that we have seen how to create and train a basic neural network, we are going to introduce a Convolutional Neural Network.


Why are we going to use it?

Because MLPs do not scale well for images and also ignore the information brought by pixel position and correlation with neighbors. Instead, CNN are mostly used for image processing and classification because they can handle the limitations of MLPs.

We will not go into depth on how they work internally since it is out of the scopte of this blog. You can find an article in this Blog with all the information: Introduction to CNN.

The CNN architecture that we will use is a simple one, to see if improves our Accuracy. There are going to be the next layers:

  • Conv2D with 200 filters of size 3 by 3. These layers are responsible for extracting features from the image.
  • MaxPool2D to reduce the spatial volume of input image after a convolution.
  • Fully Connected Layer to connect the network from a layer to another one.
In [0]:
# Create the model adding Conv2D. 
model = Sequential([Conv2D(200, (3,3), activation='relu', input_shape=(150, 150, 3)),
                    Conv2D(180, (3,3), activation='relu'),
                    Dense(180, activation='relu'),
                    Dense(6, activation='softmax')])

# Compile model.
# Loss categorical_crossentropy: targets are one-hot encoded.
In [14]:
# Fitting the model training with train test and validate test set.
# Callback: early_stopping
trained_CNN = model.fit(train_generator,
                    validation_data  = test_generator,
                    epochs = 40,
                    verbose = 1,
                    callbacks= [early_stopping]);

# Save weights
Epoch 1/40
439/439 [==============================] - 51s 116ms/step - loss: 1.0665 - accuracy: 0.5767 - val_loss: 0.8594 - val_accuracy: 0.6840
Epoch 2/40
439/439 [==============================] - 50s 113ms/step - loss: 0.7824 - accuracy: 0.7103 - val_loss: 0.7221 - val_accuracy: 0.7373
Epoch 3/40
439/439 [==============================] - 50s 113ms/step - loss: 0.6573 - accuracy: 0.7691 - val_loss: 0.5855 - val_accuracy: 0.7983
Epoch 4/40
439/439 [==============================] - 50s 113ms/step - loss: 0.5816 - accuracy: 0.7978 - val_loss: 0.4861 - val_accuracy: 0.8313
Epoch 5/40
439/439 [==============================] - 49s 113ms/step - loss: 0.5337 - accuracy: 0.8104 - val_loss: 0.5512 - val_accuracy: 0.8073
Epoch 6/40
439/439 [==============================] - 49s 112ms/step - loss: 0.5008 - accuracy: 0.8212 - val_loss: 0.4994 - val_accuracy: 0.8200
Epoch 7/40
439/439 [==============================] - 49s 113ms/step - loss: 0.4533 - accuracy: 0.8367 - val_loss: 0.5165 - val_accuracy: 0.8200
In [15]:
In [16]:
# Load weights and evaluate model
model_CNN_score = model.evaluate(test_generator)
print("Model CNN Test Loss:", model_CNN_score[0])
print("Model CNN Test Accuracy:", model_CNN_score[1])
94/94 [==============================] - 5s 52ms/step - loss: 0.5165 - accuracy: 0.8200
Model CNN Test Loss: 0.5164828300476074
Model CNN Test Accuracy: 0.8199999928474426

6. Apply Data Augmentation

Image data augmentation is a technique that can be used to artificially expand the size of the training set by creating modified versions of images in the dataset in order to reduce overfitting.

Keras provides us the capability to fit our models using image data augmentation via the ImageDataGenerator that we saw earlier.

A brief explication of each parameter that we will use:

  • Shear_range: for randomly applying shearing transformations.
  • Zoom_range: for randomly zooming inside pictures.
  • Horizontal_flip: for randomly flipping half of the images horizontally.

For example, if we flip this image of a tree, you still know that it is a tree and if you zoom this picture, too.


In [0]:
# Create ImageDataGenerator with new parameters for Data Augmentation
image_generator = ImageDataGenerator(
In [18]:
train_generator = image_generator.flow_from_directory(train_dir,
                                                      target_size = (150,150),

test_generator = image_generator.flow_from_directory(test_dir,
Found 14034 images belonging to 6 classes.
Found 3000 images belonging to 6 classes.
In [0]:
# Create the same model as the previous one
model = Sequential([Conv2D(200, (3,3), activation='relu', input_shape=(150, 150, 3)),
                    Conv2D(180, (3,3), activation='relu'),
                    Dense(180, activation='relu'),
                    Dense(6, activation='softmax')])

In [20]:
# Fitting the model training with train test and validate test set.
# Callback: early_stopping
trained_DA = model.fit(train_generator,
                    validation_data  = test_generator,
                    epochs = 40,
                    verbose = 1,
                    callbacks= [early_stopping])

# Save weights
Epoch 1/40
439/439 [==============================] - 116s 265ms/step - loss: 1.1379 - accuracy: 0.5512 - val_loss: 0.8780 - val_accuracy: 0.7007
Epoch 2/40
439/439 [==============================] - 117s 266ms/step - loss: 0.8181 - accuracy: 0.7059 - val_loss: 0.6649 - val_accuracy: 0.7620
Epoch 3/40
439/439 [==============================] - 116s 264ms/step - loss: 0.7196 - accuracy: 0.7462 - val_loss: 0.6942 - val_accuracy: 0.7657
Epoch 4/40
439/439 [==============================] - 115s 263ms/step - loss: 0.6579 - accuracy: 0.7688 - val_loss: 0.5809 - val_accuracy: 0.7973
Epoch 5/40
439/439 [==============================] - 116s 264ms/step - loss: 0.6204 - accuracy: 0.7801 - val_loss: 0.6075 - val_accuracy: 0.7920
Epoch 6/40
439/439 [==============================] - 116s 263ms/step - loss: 0.5808 - accuracy: 0.7977 - val_loss: 0.5410 - val_accuracy: 0.8113
Epoch 7/40
439/439 [==============================] - 115s 262ms/step - loss: 0.5555 - accuracy: 0.8053 - val_loss: 0.4717 - val_accuracy: 0.8337
Epoch 8/40
439/439 [==============================] - 115s 263ms/step - loss: 0.5271 - accuracy: 0.8140 - val_loss: 0.4615 - val_accuracy: 0.8370
Epoch 9/40
439/439 [==============================] - 116s 264ms/step - loss: 0.5017 - accuracy: 0.8236 - val_loss: 0.4697 - val_accuracy: 0.8353
Epoch 10/40
439/439 [==============================] - 115s 262ms/step - loss: 0.4962 - accuracy: 0.8244 - val_loss: 0.4612 - val_accuracy: 0.8423
Epoch 11/40
439/439 [==============================] - 115s 263ms/step - loss: 0.4696 - accuracy: 0.8355 - val_loss: 0.4559 - val_accuracy: 0.8427
Epoch 12/40
439/439 [==============================] - 116s 264ms/step - loss: 0.4614 - accuracy: 0.8358 - val_loss: 0.4775 - val_accuracy: 0.8283
Epoch 13/40
439/439 [==============================] - 117s 265ms/step - loss: 0.4473 - accuracy: 0.8428 - val_loss: 0.4340 - val_accuracy: 0.8497
Epoch 14/40
439/439 [==============================] - 115s 262ms/step - loss: 0.4345 - accuracy: 0.8457 - val_loss: 0.4733 - val_accuracy: 0.8320
Epoch 15/40
439/439 [==============================] - 116s 265ms/step - loss: 0.4453 - accuracy: 0.8446 - val_loss: 0.4407 - val_accuracy: 0.8543
Epoch 16/40
439/439 [==============================] - 115s 263ms/step - loss: 0.4214 - accuracy: 0.8513 - val_loss: 0.4963 - val_accuracy: 0.8323
In [21]:
In [22]:
# Load weights and evaluate model
model_DA_score = model.evaluate(test_generator)
print("Model with Data Augmentation Test Loss:", model_DA_score[0])
print("Model with Data Augmentation Test Accuracy:", model_DA_score[1])
94/94 [==============================] - 19s 200ms/step - loss: 0.4901 - accuracy: 0.8317
Model with Data Augmentation Test Loss: 0.49005773663520813
Model with Data Augmentation Test Accuracy: 0.8316666483879089

7. Fine tuning

Now that we have seen how to create a CNN model, and how to do data augmentation and observe the results, we are going to fine-tune a popular network model: MobileNet.

What is Fine Tuning?

First of all, we have to know that Fine-tuning is a way of applying transfer learning.

Transfer learning occurs when we use the knowledge that was gained from solving one problem and apply it to a new but related problem. So, as we said, fine-tuning is a way of utilizing it.

It is a process that takes a model that has already been trained for one given task and then tunes the model to make it perform a second similar task.


Why are we going to use Fine-Tuning?

Usually, we are going to use it when we have a task that is similar to a model that has already been designed and trained, allowing us to take advantage of that without having to develop it from scratch. Also when you have a small amount of data for the new problem compared with the previous one.

We are going to use the MobileNetV2 network, that it's faster and smaller than other major networks, like VGG16.

In [23]:
mobile_model = MobileNetV2(input_shape=(150, 150,3), include_top=False, weights='imagenet')

mobile_model.trainable = True

print("Number of layers in the MobileNetV2 model: ", len(mobile_model.layers))
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
9412608/9406464 [==============================] - 0s 0us/step
Number of layers in the MobileNetV2 model:  155

We are going to un-freeze some of the layers that in the pre-trained model is close to the top because lower convolutional layers detects low-level features like edges and curves, while the higher level, which is more specialized, detect features that are applicable to our problem (As can be seen in the diagram above).

In [0]:
fine_tune_at = 100
for layer in mobile_model.layers[:fine_tune_at]:
    layer.trainable =  False

The last layer of the MobileNetV2 has the output shape: (input width = 5) x (input height = 5) x (input channels = 1280). That is why we are going to use the GlobalAveragePooling2D, because it takes this tensor and computes the average value of all values across the entire (input width) x (input height) matrix for each of the input channels.


So, the output it's going to be a 1-dimensional tensor of the size of the input channels as we can see in the summary.

In [25]:
# Create model adding the pre-trained model mobileNetV2, 
# adding GlobalAveragePooling2D layer
model = Sequential([mobile_model,
                    Dense(6, activation='softmax')])

model.compile(optimizer= RMSprop(lr=2e-5),

Model: "sequential_3"
Layer (type)                 Output Shape              Param #   
mobilenetv2_1.00_224 (Model) (None, 5, 5, 1280)        2257984   
global_average_pooling2d (Gl (None, 1280)              0         
dropout_3 (Dropout)          (None, 1280)              0         
dense_7 (Dense)              (None, 6)                 7686      
Total params: 2,265,670
Trainable params: 1,870,278
Non-trainable params: 395,392
In [26]:
# Fitting model and save weights
trained_FT = model.fit(train_generator,

Epoch 1/20
439/439 [==============================] - 114s 260ms/step - loss: 0.8530 - accuracy: 0.6885 - val_loss: 0.5230 - val_accuracy: 0.8210
Epoch 2/20
439/439 [==============================] - 114s 259ms/step - loss: 0.4230 - accuracy: 0.8510 - val_loss: 0.3953 - val_accuracy: 0.8700
Epoch 3/20
439/439 [==============================] - 114s 260ms/step - loss: 0.3539 - accuracy: 0.8757 - val_loss: 0.3405 - val_accuracy: 0.8853
Epoch 4/20
439/439 [==============================] - 115s 261ms/step - loss: 0.3035 - accuracy: 0.8948 - val_loss: 0.3149 - val_accuracy: 0.8930
Epoch 5/20
439/439 [==============================] - 119s 271ms/step - loss: 0.2848 - accuracy: 0.8990 - val_loss: 0.2885 - val_accuracy: 0.8990
Epoch 6/20
439/439 [==============================] - 117s 267ms/step - loss: 0.2547 - accuracy: 0.9084 - val_loss: 0.2858 - val_accuracy: 0.8993
Epoch 7/20
439/439 [==============================] - 117s 266ms/step - loss: 0.2387 - accuracy: 0.9160 - val_loss: 0.2513 - val_accuracy: 0.9083
Epoch 8/20
439/439 [==============================] - 117s 267ms/step - loss: 0.2158 - accuracy: 0.9216 - val_loss: 0.2375 - val_accuracy: 0.9190
Epoch 9/20
439/439 [==============================] - 117s 265ms/step - loss: 0.2040 - accuracy: 0.9268 - val_loss: 0.2444 - val_accuracy: 0.9163
Epoch 10/20
439/439 [==============================] - 116s 264ms/step - loss: 0.1901 - accuracy: 0.9323 - val_loss: 0.2411 - val_accuracy: 0.9227
Epoch 11/20
439/439 [==============================] - 113s 258ms/step - loss: 0.1861 - accuracy: 0.9341 - val_loss: 0.2352 - val_accuracy: 0.9220
Epoch 12/20
439/439 [==============================] - 113s 257ms/step - loss: 0.1689 - accuracy: 0.9363 - val_loss: 0.2490 - val_accuracy: 0.9203
Epoch 13/20
439/439 [==============================] - 112s 256ms/step - loss: 0.1536 - accuracy: 0.9426 - val_loss: 0.2302 - val_accuracy: 0.9263
Epoch 14/20
439/439 [==============================] - 111s 254ms/step - loss: 0.1526 - accuracy: 0.9464 - val_loss: 0.2488 - val_accuracy: 0.9213
Epoch 15/20
439/439 [==============================] - 111s 253ms/step - loss: 0.1389 - accuracy: 0.9485 - val_loss: 0.2481 - val_accuracy: 0.9250
Epoch 16/20
439/439 [==============================] - 110s 250ms/step - loss: 0.1320 - accuracy: 0.9515 - val_loss: 0.2494 - val_accuracy: 0.9207
In [27]:
In [28]:
model_FT_score = model.evaluate(test_generator)
print("Model Fine Tuning Test Loss:", model_FT_score[0])
print("Model Fine Tuning Test Accuracy:", model_FT_score[1])
94/94 [==============================] - 19s 197ms/step - loss: 0.2594 - accuracy: 0.9173
Model Fine Tuning Test Loss: 0.25941866636276245
Model Fine Tuning Test Accuracy: 0.9173333048820496

We can plot some images from the prediction set to predict the classes.

In [30]:
img1 = image.load_img('/content/intel_image/seg_pred/seg_pred/5.jpg', target_size=(150, 150))
x = image.img_to_array(img1)
x = np.expand_dims(x, axis=0)
prediction1 = model.predict(x, batch_size=10)

img2 = image.load_img('/content/intel_image/seg_pred/seg_pred/176.jpg', target_size=(150, 150))
y = image.img_to_array(img2)
y = np.expand_dims(y, axis=0)
prediction2 = model.predict(y, batch_size=10)

plt.title("Predicted class: " + str(np.argmax(prediction1[0])))
plt.title("Predicted class: " + str(np.argmax(prediction2[0])))
<matplotlib.image.AxesImage at 0x7f85d9088a58>

8. Conclusions

In this post we have managed to create a tutorial on how to start from scratch using TensorFlow, starting from a very simple neural network, going through data augmentation techniques, and finally learning how to do fine-tuning.

That is why we can conclude that:

  • Applying data augmentation techniques, in our problem, it seems that the accuracy does not improve as much as we expected but reduces overfitting.

  • If we don't want to do fine-tuning, only with a CNN like ours we already get a very good result, with 84% accuracy in the test set.

Author face

Cosmina Nicu

Computer Science Student at Universitat Autònoma de Barcelona (UAB)