Benchmark Description


Object Classficaiton
The deep learning based neural networks are able to recognize object classes for one or more given input photos. They can recognize 1000 different object classes. There are four models, mobilenet-V1, mobilenet-V2, Resnet-50, and Inception-V3, in our benchmarking App. Individually, we provide one float model(FP 32) and one quantized model(INT 8) for each network.
Network : MobileNet-V1
Input Size : 224 X 224 px
Model Size(F|Q) : 16.9 MB 4.3 MB
Model MACs(F|Q) : 0.569 GFLOPs 0.569 GFLOPs
Accuracy(F|Q) : Top1: 66.6%
Top5: 90.4%
Top1: 64.2%
Top5: 86.0%
Baseline Latency(F|Q) : 199 ms 132 ms
Baseline FPS(F|Q) : 5.0251 7.5758
Model source(F|Q) : TFLite TFLite
Network : MobileNet-V2
Input Size : 224 X 224 px
Model Size(F|Q) : 14.0 MB 3.6 MB
Model MACs(F|Q) : 0.3 GFLOPs 0.3 GFLOPs
Accuracy(F|Q) : Top1: 66.0%
Top5: 89.0%
Top1: 65.6%
Top5: 87.0%
Baseline Latency(F|Q) : 234 ms 127 ms
Baseline FPS(F|Q) : 4.2735 7.8740
Model source(F|Q) : TFLite TFLite
Network : Inception-V3
Input Size : 346 X 346 px
Model Size(F|Q) : 95.3 MB 24.1 MB
Model MACs(F|Q) : 6 GFLOPs 6 GFLOPs
Accuracy(F|Q) : Top1: 80.8%
Top5: 96.0%
Top1: 79.4%
Top5: 96.0%
Baseline Latency(F|Q) : 1198 ms 872 ms
Baseline FPS(F|Q) : 0.8347 1.1468
Model source(F|Q) : TFLite TFLite
Network : ResNet-50
Input Size : 224 X 224 px
Model Size(F|Q) : 102.2 MB 25.7 MB
Model MACs(F|Q) : 4 GFLOPs 4 GFLOPs
Accuracy(F|Q) : Top1: 70.7%
Top5: 91.0%
Top1: 69/8%
Top5: 88.8%
Baseline Latency(F|Q) : 1712 ms 655 ms
Baseline FPS(F|Q) : 0.5841 1.5267
Model source(F|Q) : TFLite TFLite

Object Segmentation
The task evaluates whether you can perform auto background removing for exchange scene on your phone. It can recognize 20 different object classes and segment the recognized object using different colors. The model is resnet-50 with atrous convolution layers embedded. Now, we only provide a float model.
Original Image Original
Modified Image Modified
Network : DeepLab-V3
Input : 513 X 513 px
Model Size(F) : 8.5 MB
Model MACs(F) : 76 GFLOPs
Accuracy(F) : 74.51%
Baseline Latency(F) : 3482 ms
Baseline FPS(F) : 0.29
Model source(F) : TFLite

Object Detection
Now, you can perform object counting on your phone. The model we use is a combination of mobilenet, a light-weight classification model, and single shot multibox detector(SSD), an object detector doesn’t require resampling pixels or feature maps for bounding box hypotheses, can detect 80 different object classes. It improves in speed for high-accuracy detection. We also provide float models and quantized models for this application.
Original Image Original
Modified Image Modified
Network : MobileNet-SSD
Input : 224 X 224 px
Model Size(F|Q) : 27.3 MB 6.9 MB
Model MACs(F|Q) : 1.2 GFLOPs 1.2 GFLOPs
Accuracy(F|Q) : mAP50: 35.572 mAP50: 31.086
Baseline Latency(F|Q) : 707 ms 257 ms
Baseline FPS(F|Q) : 1.41 3.89
Model source(F|Q) : TFLite TFLite