Benchmark Description

Object Classficaiton

The deep learning based neural networks are able to recognize object classes for one or more given input photos. They can recognize 1000 different object classes. There are four models, mobilenet-V1, mobilenet-V2, Resnet-50, and Inception-V3, in our benchmarking App. Individually, we provide one float model(FP 32) and one quantized model(INT 8) for each network.

Network :	MobileNet-V1
Input Size :	224 X 224 px
Model Size(F\|Q) :	16.9 MB	4.3 MB
Model MACs(F\|Q) :	0.569 GFLOPs	0.569 GFLOPs
Accuracy(F\|Q) :	Top1: 66.6% Top5: 90.4%	Top1: 64.2% Top5: 86.0%
Baseline Latency(F\|Q) :	199 ms	132 ms
Baseline FPS(F\|Q) :	5.0251	7.5758
Model source(F\|Q) :	TFLite	TFLite

Network :	MobileNet-V2
Input Size :	224 X 224 px
Model Size(F\|Q) :	14.0 MB	3.6 MB
Model MACs(F\|Q) :	0.3 GFLOPs	0.3 GFLOPs
Accuracy(F\|Q) :	Top1: 66.0% Top5: 89.0%	Top1: 65.6% Top5: 87.0%
Baseline Latency(F\|Q) :	234 ms	127 ms
Baseline FPS(F\|Q) :	4.2735	7.8740
Model source(F\|Q) :	TFLite	TFLite

Network :	Inception-V3
Input Size :	346 X 346 px
Model Size(F\|Q) :	95.3 MB	24.1 MB
Model MACs(F\|Q) :	6 GFLOPs	6 GFLOPs
Accuracy(F\|Q) :	Top1: 80.8% Top5: 96.0%	Top1: 79.4% Top5: 96.0%
Baseline Latency(F\|Q) :	1198 ms	872 ms
Baseline FPS(F\|Q) :	0.8347	1.1468
Model source(F\|Q) :	TFLite	TFLite

Network :	ResNet-50
Input Size :	224 X 224 px
Model Size(F\|Q) :	102.2 MB	25.7 MB
Model MACs(F\|Q) :	4 GFLOPs	4 GFLOPs
Accuracy(F\|Q) :	Top1: 70.7% Top5: 91.0%	Top1: 69/8% Top5: 88.8%
Baseline Latency(F\|Q) :	1712 ms	655 ms
Baseline FPS(F\|Q) :	0.5841	1.5267
Model source(F\|Q) :	TFLite	TFLite

Object Segmentation

The task evaluates whether you can perform auto background removing for exchange scene on your phone. It can recognize 20 different object classes and segment the recognized object using different colors. The model is resnet-50 with atrous convolution layers embedded. Now, we only provide a float model.

Network :	DeepLab-V3
Input :	513 X 513 px
Model Size(F) :	8.5 MB
Model MACs(F) :	76 GFLOPs
Accuracy(F) :	74.51%
Baseline Latency(F) :	3482 ms
Baseline FPS(F) :	0.29
Model source(F) :	TFLite

Object Detection

Now, you can perform object counting on your phone. The model we use is a combination of mobilenet, a light-weight classification model, and single shot multibox detector(SSD), an object detector doesn’t require resampling pixels or feature maps for bounding box hypotheses, can detect 80 different object classes. It improves in speed for high-accuracy detection. We also provide float models and quantized models for this application.

Network :	MobileNet-SSD
Input :	224 X 224 px
Model Size(F\|Q) :	27.3 MB	6.9 MB
Model MACs(F\|Q) :	1.2 GFLOPs	1.2 GFLOPs
Accuracy(F\|Q) :	mAP50: 35.572	mAP50: 31.086
Baseline Latency(F\|Q) :	707 ms	257 ms
Baseline FPS(F\|Q) :	1.41	3.89
Model source(F\|Q) :	TFLite	TFLite