Object Detection using SSD MobileNet with TensorFlow.
SSD(Single shot Detector) is an algorithm used for Realtime object detection.
SSD (Single Shot Detector)
SSD composes of two parts
- Extracting Feature Map
- Apply Convolutional Filter to detect Object
In first part it extract the features presents in image (In simple terms it builds feature map of image).Feature map is basically output of CNN which will extract some important portion in image eg. hands, eyes, etc. for more information about feature map visit here
In Second part it will classify the object present in the image and build the bounding boxes around them.
For detailed information about SSD visit here
why SSD?
As I previously said that for real-time object detection we need maximum fps as possible. which is an fact right?
So we cannot use Fast-RCNN, Faster RCNN or Mask-RCNN for real-time purpose, though the accuracy may be better than SSD.
SSD300 achieves 74.3% mAP (mean average precision) at 59 FPS(Frames Per Second) while SSD500 achieves 76.9% mAP at 22 FPS, which outperforms Faster R-CNN (73.2% mAP at 7 FPS) and YOLOv1 (63.4% mAP at 45 FPS).
I think this is enough to prove why SSD is ideal choice for real-time Object detection.
MobileNet
If you know a bit about CNN then probably you know that they have computational cost which is the main issue for deployment of such models on to the edge devices like Android or Arduino. To solve this problem we MobileNet Architecture was invented.
Unlike Other CNN mobile-net does not use Standard Convolutional rather it uses Depthwise Separable Convolutions which consists of Depthwise Convolutions and Pointwise Convolutions.
let’s see a bit description of all techniques used by MobileNet architecture
Depthwise Convolutions
Computational cost of standard convolutions having kernels of size Dk*Dk each computes the entire input of size Df*Df*M then computational cost will be
while in Depthwise convolution for each input there is different kernels i.e. for M input channels there are M kernels which implies there are M Dk * Dk kernels. Now computational cost will be
Pointwise Convolution
A pointwise convolution performs a 1x1 convolution it is same as standard convolution except the kernel size is 1x1.
Computational cost will be
mobile uses two more techniques which are
Width Multiplier
The 2nd technique for reducing the computational cost is the “width multiplier” which is a hyperparameter inhabiting the range [0, 1] denoted here as 𝛼α. 𝛼α reduces the number of input and output channels proportionally:
Resolution Multiplier
The 3rd technique for reducing the computational cost is the “resolution multiplier” which is a hyperparameter inhabiting the range [0, 1] denoted here as 𝜌ρ. 𝜌ρ reduces the size of the input feature map:
Combining the width and resolution multipliers results in a computational cost of:
Enough Theory now let’s see some code for it.
creating MobileNet Neural Network from Scratch.
Here every MobileNet Block will have same Structure as
From the above Jupyter Notebook I had created MobileNetV1 in Tensorflow 2.
As an Training data I choose COCO dataset which consists of 80 categories and 300k images from which 200k+ were Labeled which is pretty cool 🆒.
For this Project I am not doing training from scratch rather I am Using the Inference Graph provided by TensorFlow Model Zoo.
You can Visit my GitHub Repo for code behind it
Here is jupyter notebook showing final code
So The Objects were Detected but what we are doing with those detected objects?
I want to create an Driver Alert App which will alert the driver if the car is too close let’s say 5 meters.
Note that I am not using LIDAR or Radar.
so how can I calculate the 3D distance into 2D Image?
It really sounds an Impossible thing? Yeah But Believe me it is possible by using some Techniques.
But Rather than Doing this I am applying some simple Logic. Let’s say the approximate length of car is 4 meters and it acquires some p pixels in image so can I use an simple trick that
If car is detected and if it acquires greater than or equal to p pixels then we will Alert the driver that car is too close.
In order to archive this result we need to move backward after creation of model.
Deploying in Android
So First the model is created now we have to deploy it to android.
For doing this I took help of TFlite library for deploying such models on the edge devices(In this case Android).
After Converting file to .tflite format the main Android thing starts.
I took the Reference of tensorflow android deployment guide available at tensorflow model Zoo.
First see the video of the app so that you can get how it works.
Video
Features of app
Currently for this project there are two features
- Detect the objects and show the bounding boxes around it with labels and accuracy score.
- Whenever an Car, Truck or Bus come in range of 5 meters then it will speak Warning And phone will also vibrate. So it is alerting the driver that vehicle is too close.
I am Looking forward to add new Feature to it.
you can download app from here. But please note that your android version is greater than or equal to 9.
Source code for project can be found on my GitHub Repo
Acknowledgments
I would like to thank many people who knowingly or unknowingly helped me in building this project.