Object Detection using SSD MobileNet with TensorFlow.

Kamlesh Solanki
6 min readSep 28, 2020

--

SSD(Single shot Detector) is an algorithm used for Realtime object detection.

SSD (Single Shot Detector)

SSD Mobile-Net

SSD composes of two parts

  1. Extracting Feature Map
  2. Apply Convolutional Filter to detect Object

In first part it extract the features presents in image (In simple terms it builds feature map of image).Feature map is basically output of CNN which will extract some important portion in image eg. hands, eyes, etc. for more information about feature map visit here

In Second part it will classify the object present in the image and build the bounding boxes around them.

For detailed information about SSD visit here

why SSD?

As I previously said that for real-time object detection we need maximum fps as possible. which is an fact right?

So we cannot use Fast-RCNN, Faster RCNN or Mask-RCNN for real-time purpose, though the accuracy may be better than SSD.

SSD300 achieves 74.3% mAP (mean average precision) at 59 FPS(Frames Per Second) while SSD500 achieves 76.9% mAP at 22 FPS, which outperforms Faster R-CNN (73.2% mAP at 7 FPS) and YOLOv1 (63.4% mAP at 45 FPS).

I think this is enough to prove why SSD is ideal choice for real-time Object detection.

MobileNet

If you know a bit about CNN then probably you know that they have computational cost which is the main issue for deployment of such models on to the edge devices like Android or Arduino. To solve this problem we MobileNet Architecture was invented.

Unlike Other CNN mobile-net does not use Standard Convolutional rather it uses Depthwise Separable Convolutions which consists of Depthwise Convolutions and Pointwise Convolutions.

let’s see a bit description of all techniques used by MobileNet architecture

Depthwise Convolutions

Computational cost of standard convolutions having kernels of size Dk*Dk each computes the entire input of size Df*Df*M then computational cost will be

while in Depthwise convolution for each input there is different kernels i.e. for M input channels there are M kernels which implies there are M Dk * Dk kernels. Now computational cost will be

Pointwise Convolution

A pointwise convolution performs a 1x1 convolution it is same as standard convolution except the kernel size is 1x1.

Computational cost will be

mobile uses two more techniques which are

Width Multiplier

The 2nd technique for reducing the computational cost is the “width multiplier” which is a hyperparameter inhabiting the range [0, 1] denoted here as 𝛼α. 𝛼α reduces the number of input and output channels proportionally:

Resolution Multiplier

The 3rd technique for reducing the computational cost is the “resolution multiplier” which is a hyperparameter inhabiting the range [0, 1] denoted here as 𝜌ρ. 𝜌ρ reduces the size of the input feature map:

Combining the width and resolution multipliers results in a computational cost of:

Enough Theory now let’s see some code for it.

creating MobileNet Neural Network from Scratch.

Here every MobileNet Block will have same Structure as

Source Google

From the above Jupyter Notebook I had created MobileNetV1 in Tensorflow 2.

As an Training data I choose COCO dataset which consists of 80 categories and 300k images from which 200k+ were Labeled which is pretty cool 🆒.

For this Project I am not doing training from scratch rather I am Using the Inference Graph provided by TensorFlow Model Zoo.

You can Visit my GitHub Repo for code behind it

Here is jupyter notebook showing final code

So The Objects were Detected but what we are doing with those detected objects?

I want to create an Driver Alert App which will alert the driver if the car is too close let’s say 5 meters.

Note that I am not using LIDAR or Radar.

so how can I calculate the 3D distance into 2D Image?

It really sounds an Impossible thing? Yeah But Believe me it is possible by using some Techniques.

But Rather than Doing this I am applying some simple Logic. Let’s say the approximate length of car is 4 meters and it acquires some p pixels in image so can I use an simple trick that

If car is detected and if it acquires greater than or equal to p pixels then we will Alert the driver that car is too close.

In order to archive this result we need to move backward after creation of model.

Deploying in Android

So First the model is created now we have to deploy it to android.

For doing this I took help of TFlite library for deploying such models on the edge devices(In this case Android).

After Converting file to .tflite format the main Android thing starts.

I took the Reference of tensorflow android deployment guide available at tensorflow model Zoo.

First see the video of the app so that you can get how it works.

Video

Features of app

Currently for this project there are two features

  1. Detect the objects and show the bounding boxes around it with labels and accuracy score.
  2. Whenever an Car, Truck or Bus come in range of 5 meters then it will speak Warning And phone will also vibrate. So it is alerting the driver that vehicle is too close.

I am Looking forward to add new Feature to it.

you can download app from here. But please note that your android version is greater than or equal to 9.

Source code for project can be found on my GitHub Repo

Acknowledgments

I would like to thank many people who knowingly or unknowingly helped me in building this project.

--

--

Kamlesh Solanki
Kamlesh Solanki

Written by Kamlesh Solanki

AI Enthusiast, Python Programmer

No responses yet