Model Development: AI End-to-End Series (Part — 3)

The Dataset

Images from our Dataset — Mask wearers
Images from our Dataset — Non-Mask wearers

Data Pre-processing

Data Augmentation

  • We are doing data augmentation by randomly applying various features.
  • This increases the diversity of data available for training models, without actually collecting new data
  • Some of the common Data Augmentation filters are :
    - Random Rotation
    - Horizontal Shift
    - Vertical Shift
    - Random Fliping
    - Shearing
    - Random Zooming

Model Building

  • Our Model consists of 3 convolutional layers followed by Max Pooling Layers and dropout.
  • There is a fully connected layer with 128 units after convolutional that is activated by a ReLU activation function.
  • Here we are using Binary Cross-Entropy loss with an ADAM optimizer for our binary classification problem.
  • According to the model summary, there are 6,446,369 trainable parameters.
Representation of our Face Mask Detection Model
  • The training/validation loss and accuracy graphs for the model is as follows:
Loss/Accuracy of our model

Using Transfer Learning

  • Another way to build a model can be using transfer learning. But what is it? It is like learning to ride a bicycle and taking that experience to learn a bike/scooter.
Transfer Learning Interpretation
  • Transfer learning consists of taking features learned on one problem and leveraging them on a new, similar problem.
  • For instance, features from a model that has learned to identify raccoons may be useful to kick-start a model meant to identify tanukis.
Raccoon and Japanese Tanukis
  • Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch.
  • The most common incarnation of transfer learning in the context of deep learning is the following workflow:
    - Take layers from a previously trained model.
    - Freeze them, so as to avoid destroying any of the information they contain during future training rounds.
    - Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.
    - Train the new layers on your dataset.
  • A last, optional step, is fine-tuning, which consists of unfreezing the entire model you obtained above (or part of it) and re-training it on the new data with a very low learning rate.
  • This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data.


  • MobileNet-v2 is a convolutional neural network that is 53 layers deep.
  • You can load a pretrained version of the network trained on more than a million images from the ImageNet database.
  • The pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals.
  • As a result, the network has learned rich feature representations for a wide range of images.
  • The network has an image input size of 224-by-224.
  • MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases.
  • The architecture delivers high accuracy results while keeping the parameters and mathematical operations as low as possible to bring deep neural networks to mobile devices.
  • In MobileNetV2, there are two types of blocks.
    - Inverted Residual Block
    - Bottleneck Residual Block
  • There are 3 layers for both types of blocks.
Block Types in MobileNetV2
  • One is a residual block with a stride of 1. Another one is a block with a stride of 2 for downsizing.
  • There are two types of Convolution layers in MobileNet V2 architecture:
    - 1x1 Convolution
    - 3x3 Depthwise Convolution
  • A MobileNetV2 network looks like this:
  • We have connected the output of a base MobileNetV2 network with a new model.
  • This model consists of an average pooling layer, followed by a flattening layer, and finally a fully connected dense neural network.
  • The output layer consists of a sigmoid activation to perform the binary classification.
  • On training this network, we are achieving an accuracy of 0.99.
Loss/Accuracy using Transfer Learning
  • If we were to fine-tune the model by re-training the entire model on our data, we can even achieve a test accuracy of 1.0.
  • You can visualize the model’s training and validation accuracy and loss using this TensorBoard Extension.
  • You can observe from the graphs that for our model we got a validation accuracy of around 88% but by using transfer learning, we achieved a validation accuracy of more than 97%.
  • There we have it — a model ready to be deployed, with great accuracy.

What’s Next?

Visit us on




One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understanding ML-Product Lifecycle Patterns

Something-More about KNN Classification

Dimensionality Reduction- How to deal with the features in your dataset (Part 1).

“Gravitational Waves” Science-Research, February 2022, Week 2 — summary from OSTI GOV, Arxiv and…

Using Eigenvectors to Find Steady State Population Flows

“Sensor fusion” Science-Research, October 2021 — summary from Arxiv and Springer Nature

My First Simple NLP based Heroku APP : 5 Easy Steps to Deploy Flask application on Heroku

The Deep Meta-Learning Will Change How We Program Computers Forever

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Recommendation System with Content-based Filtering

Starting with TensorFlow Datasets -part 1; An intro to tf.datasets

Part 2: Exploring and Engineering X-Ray Data Features

Detection and Normalization of Temporal Expressions in French Text (4) — A Demonstration…